Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-06 Thread SATYANARAYANA NARLAPURAM
On Thu, Jan 6, 2022 at 11:24 PM Jeff Davis  wrote:

> On Wed, 2022-01-05 at 23:59 -0800, SATYANARAYANA NARLAPURAM wrote:
> > I would like to propose a GUC send_Wal_after_quorum_committed which
> > when set to ON, walsenders corresponds to async standbys and logical
> > replication workers wait until the LSN is quorum committed on the
> > primary before sending it to the standby. This not only simplifies
> > the post failover steps but avoids unnecessary downtime for the async
> > replicas. Thoughts?
>
> Do we need a GUC? Or should we just always require that sync rep is
> satisfied before sending to async replicas?
>

I proposed a GUC to not introduce a behavior change by default. I have no
strong opinion on having a GUC or making the proposed behavior default,
would love to get others' perspectives as well.


>
> It feels like the sync quorum should always be ahead of the async
> replicas. Unless I'm missing a use case, or there is some kind of
> performance gotcha.
>

I couldn't think of a case that can cause serious performance issues but
will run some experiments on this and post the numbers.



>
> Regards,
> Jeff Davis
>
>
>


Re: preserve timestamps when installing headers

2022-01-06 Thread Peter Eisentraut

On 04.01.22 22:21, Tom Lane wrote:

However, there's another problem with using INSTALL_DATA as a solution
to this issue: why would you expect that to preserve timestamps?
install-sh won't.  I see that /usr/bin/install (which configure picks
on my RHEL box) won't preserve them by default, but it has a -p
option to do so.  I would not bet on that being portable to all of
the myriad of foo-install programs that configure will accept, though.


I don't think preserving timestamps should be the default behavior, but 
I would support organizing things so that additional options can be 
passed to "install" to make it do whatever the user prefers.  But that 
won't work if some installations don't go through install.


We could have some mode where "install" is used instead of "cp", if 
someone wants to figure out exactly how to make that determination.


Btw., a quick test of make -C src/include/ install:

cp (current code): 0.5 s
GNU install: 0.6 s
install-sh: 12.5 s




Re: row filtering for logical replication

2022-01-06 Thread Peter Smith
Below are some review comments for the v60 patches.

V60-0001 Review Comments


1. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
unnecessary parens

+ /*
+ * NOTE: Multiple publication row filters have already been combined to a
+ * single exprstate (for this pubaction).
+ */
+ if (entry->exprstate[changetype])
+ {
+ /* Evaluates row filter */
+ result = pgoutput_row_filter_exec_expr(entry->exprstate[changetype], ecxt);
+ }

This is a single statement so it may be better to rearrange by
removing the unnecessary parens and moving the comment.

e.g.

/*
* Evaluate the row filter.
*
* NOTE: Multiple publication row filters have already been combined to a
* single exprstate (for this pubaction).
*/
if (entry->exprstate[changetype])
result = pgoutput_row_filter_exec_expr(entry->exprstate[changetype], ecxt);


v60-0002 Review Comments


1. Commit Message

Some parts of this message could do with some minor re-wording. Here
are some suggestions:

1a.
BEFORE:
Also tuples that have been deformed will be cached in slots to avoid
multiple deforming of tuples.
AFTER:
Also, tuples that are deformed will be cached in slots to avoid unnecessarily
deforming again.

1b.
BEFORE:
However, after the UPDATE the new
tuple doesn't satisfy the row filter then, from the data consistency
perspective, that row should be removed on the subscriber.
AFTER:
However, after the UPDATE the new
tuple doesn't satisfy the row filter, so from a data consistency
perspective, that row should be removed on the subscriber.

1c.
BEFORE:
Keep this row on the subscriber is undesirable because it...
AFTER
Leaving this row on the subscriber is undesirable because it...

1d.
BEFORE:
However, after the UPDATE
the new tuple does satisfies the row filter then, from the data
consistency perspective, that row should inserted on the subscriber.
AFTER:
However, after the UPDATE
the new tuple does satisfy the row filter, so from a data
consistency perspective, that row should be inserted on the subscriber.

1e.
"Subsequent UPDATE or DELETE statements have no effect."

Why won't they have an effect? The first impression is the newly
updated tuple now matches the filter, I think this part seems to need
some more detailed explanation. I saw there are some slightly
different details in the header comment of the
pgoutput_row_filter_update_check function - does it help?

~~

2. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter decl

+static bool pgoutput_row_filter(enum ReorderBufferChangeType
changetype, EState *relation, Oid relid,
+ HeapTuple oldtuple, HeapTuple newtuple,
+ TupleTableSlot *slot, RelationSyncEntry *entry);

The 2nd parameter should be called "EState *estate" (not "EState *relation").

~~

3. src/backend/replication/pgoutput/pgoutput.c -
pgoutput_row_filter_update_check header comment

This function header comment looks very similar to an extract from the
0002 comment message. So any wording improvements made to the commit
message (see review comment #1) should be made here in this comment
too.

~~

4. src/backend/replication/pgoutput/pgoutput.c -
pgoutput_row_filter_update_check inconsistencies, typos, row-filter

4a. The comments here are mixing terms like "oldtuple" / "old tuple" /
"old_tuple", and "newtuple" / "new tuple" / "new_tuple". I feel it
would read better just saying "old tuple" and "new tuple" within the
comments.

4b. Typo: "row-filter" --> "row filter" (for consistency with every
other usage where the hyphen is removed)

~~

5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
unnecessary parens

+ /*
+ * The default behavior for UPDATEs is to use the new tuple for row
+ * filtering. If the UPDATE requires a transformation, the new tuple will
+ * be replaced by the transformed tuple before calling this routine.
+ */
+ if (newtuple || oldtuple)
+ ExecStoreHeapTuple(newtuple ? newtuple : oldtuple,
ecxt->ecxt_scantuple, false);
+ else
+ {
+ ecxt->ecxt_scantuple = slot;
+ }

The else is a single statement so the parentheses are not needed here.

~~

6. src/include/replication/logicalproto.h

+extern void logicalrep_write_update_cached(StringInfo out,
TransactionId xid, Relation rel,
+TupleTableSlot *oldtuple, TupleTableSlot *newtuple,
+bool binary);

This extern seems unused ???


Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-06 Thread Jeff Davis
On Wed, 2022-01-05 at 23:59 -0800, SATYANARAYANA NARLAPURAM wrote:
> I would like to propose a GUC send_Wal_after_quorum_committed which
> when set to ON, walsenders corresponds to async standbys and logical
> replication workers wait until the LSN is quorum committed on the
> primary before sending it to the standby. This not only simplifies
> the post failover steps but avoids unnecessary downtime for the async
> replicas. Thoughts?

Do we need a GUC? Or should we just always require that sync rep is
satisfied before sending to async replicas?

It feels like the sync quorum should always be ahead of the async
replicas. Unless I'm missing a use case, or there is some kind of
performance gotcha.

Regards,
Jeff Davis






Re: fix libpq comment

2022-01-06 Thread Michael Paquier
On Thu, Jan 06, 2022 at 09:33:07PM -0300, Euler Taveira wrote:
> While checking the PQping code I noticed that pg_ctl does not rely on PQping
> since commit f13ea95f9e4 (v10) so the attached patch removes a comment from
> internal_ping().

Looking at the area, the rest looks fine.  So, applied as per your
suggestion.  Thanks, Euler!
--
Michael


signature.asc
Description: PGP signature


Re: Add jsonlog log_destination for JSON server logs

2022-01-06 Thread Michael Paquier
On Thu, Jan 06, 2022 at 06:28:26PM -0500, Andrew Dunstan wrote:
> I have tested on an msys2 setup with your v8 patches and I am getting this:
> 
> #   Failed test 'current_logfiles is sane'
> #   at t/004_logrotate.pl line 96.
> #   'stderr log/postgresql-2022-01-06_222419.log
> # csvlog log/postgresql-2022-01-06_222419.csv

Yes, I was waiting for the latest results, but that did not help at
all.  Something is wrong with the patch, I am not sure what yet, but
that seems like a mistake in the backend part of it rather than the
tests.  I have switched the CF entry as waiting on author until this
is addressed.
--
Michael


signature.asc
Description: PGP signature


Re: Refactoring of compression options in pg_basebackup

2022-01-06 Thread Michael Paquier
On Thu, Jan 06, 2022 at 09:27:19AM -0500, Robert Haas wrote:
> Did you mean that -z would be a synonym for --compression-method=gzip?
> It doesn't really make sense for -Z to be that, unless it's also
> setting the compression level.

Yes, I meant "-z", not "-Z", to be a synonym of
--compression-method=gzip.  Sorry for the typo.

> My objection to --compress=$LEVEL is that the compression level seems
> like it ought to rightfully be subordinate to the choice of algorithm.
> In general, there's no reason why a compression algorithm has to offer
> a choice of compression levels at all, or why they have to be numbered
> 0 through 9. For example, lz4 on my system claims to offer compression
> levels from 1 through 12, plus a separate set of "fast" compression
> levels starting with 1 and going up to an unspecified number. And then
> it also has options to favor decompression speed, change the block
> size, and a few other parameters. We don't necessarily want to expose
> all of those options, but we should structure things so that we could
> if it became important. The way to do that is to make the compression
> algorithm the primary setting, and then anything else you can set for
> that compressor is somehow a subordinate setting.

For any compression method, that maps to an integer, so..  But I am
not going to fight hard on that.

> Put another way, we don't decide first that we want to compress with
> level 7, and then afterward decide whether that's gzip, lz4, or bzip2.
> We pick the compressor first, and then MAYBE think about changing the
> compression level.

Which is why things should be checked once all the options are
processed.  I'd recommend that you read the option patch a bit more,
that may help.

> Well what I was looking at was this:
> 
> - printf(_("  -Z, --compress=0-9 compress tar output with given
> compression level\n"));
> + printf(_("  -Z, --compress=1-9 compress tar output with given
> compression level\n"));
> + printf(_("  --compression-method=METHOD\n"
> + " method to compress data\n"));
> 
> That seems to show that, post-patch, the argument to -Z would be a
> compression level, even if --compression-method were something other
> than gzip.

Yes, after the patch --compress would be a compression level.  And, if
attempting to use with --compression-method set to "none", or
potentially "lz4", it would just fail.  If not using this "gzip", the
compression level is switched to Z_DEFAULT_COMPRESSION.  That's this
area of the patch, FWIW:
+   /*
+* Compression-related options.
+*/
+   switch (compression_method)

> It's possible that I haven't read something carefully enough, but to
> me, what I said seems to be a straightforward conclusion based on
> looking at the usage help in the patch. So if I came to the wrong
> conclusion, perhaps that usage help isn't reflecting the situation you
> intend to create, or not as clearly as it ought.

Perhaps the --help output could be clearer, then.  Do you have a
suggestion?

Bringing walmethods.c at the same page for the directory and the tar
methods was my primary goal here, and the tests are a bonus, so I've
applied this part for now, leaving pg_basebackup alone until we figure
out the layer of options we should use.  Perhaps it would be better to
revisit that stuff once the server-side compression has landed.
--
Michael


signature.asc
Description: PGP signature


Re: row filtering for logical replication

2022-01-06 Thread Amit Kapila
On Fri, Jan 7, 2022 at 9:44 AM Amit Kapila  wrote:
>
> On Thu, Jan 6, 2022 at 6:42 PM Euler Taveira  wrote:
> >
> > IMO we shouldn't reuse ReorderBufferChangeType. For a long-term solution, 
> > it is
> > fragile. ReorderBufferChangeType has values that do not matter for row 
> > filter
> > and it relies on the fact that REORDER_BUFFER_CHANGE_INSERT,
> > REORDER_BUFFER_CHANGE_UPDATE and REORDER_BUFFER_CHANGE_DELETE are the first 
> > 3
> > values from the enum, otherwise, it breaks rfnodes and no_filters in
> > pgoutput_row_filter().
> >
>
> I think you mean to say it will break in pgoutput_row_filter_init(). I
> see your point but OTOH, if we do what you are suggesting then don't
> we need an additional mapping between ReorderBufferChangeType and
> RowFilterPublishAction as row filter and pgoutput_change API need to
> use those values.
>

Can't we use 0,1,2 as indexes for rfnodes/no_filters based on change
type as they are local variables as that will avoid the fragileness
you are worried about. I am slightly hesitant to introduce new enum
when we are already using reorder buffer change type in pgoutput.c.

-- 
With Regards,
Amit Kapila.




Re: Index-only scan for btree_gist turns bpchar to char

2022-01-06 Thread Japin Li


On Fri, 07 Jan 2022 at 03:21, Tom Lane  wrote:
> I looked at this and it does seem like it might work, as per attached
> patch.  The one thing that is troubling me is that the opclass is set
> up to apply gbt_text_same, which is formally the Wrong Thing for bpchar,
> because the equality semantics shouldn't be quite the same.  But we
> could not fix that without a module version bump, which is annoying.
> I think that it might not be necessary to change it, because
>
> (1) There's no such thing as unique GIST indexes, so it should not
> matter if the "same" function is a bit stricter than the datatype's
> nominal notion of equality.  It's certainly okay for that to vary
> from the semantics applied by the consistent function --- GIST has
> no idea that the consistent function is allegedly testing equality.
>
> (2) If all the input values for a column have been coerced to the same
> typmod, then it doesn't matter because two values that are equal after
> space-stripping would be equal without space-stripping, too.
>
> However, (2) doesn't hold for an existing index that the user has failed
> to REINDEX, because then the index would contain some space-stripped
> values that same() will not say are equal to incoming new values.
> Again, I think this doesn't matter much, but maybe I'm missing
> something.  I've not really dug into what GIST uses the same()
> function for.
>
> In any case, if we do need same() to implement the identical
> behavior to bpchareq(), then the other solution isn't sufficient
> either.
>
> So in short, it seems like we ought to do some compatibility testing
> and see if this code misbehaves at all with an index created by the
> old code.  I don't particularly want to do that ... any volunteers?
>

Thanks for your patch, it looks good to me.  I'm not sure how to test this.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Skipping logical replication transactions on subscriber side

2022-01-06 Thread Masahiko Sawada
On Fri, Jan 7, 2022 at 10:04 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila  wrote:
> >
> > On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila 
> > > > > >  wrote:
> > > > > > >
> > > > > > > I thought we just want to lock before clearing the skip_xid 
> > > > > > > something
> > > > > > > like take the lock, check if the skip_xid in the catalog is the 
> > > > > > > same
> > > > > > > as we have skipped, if it is the same then clear it, otherwise, 
> > > > > > > leave
> > > > > > > it as it is. How will that disallow users to change skip_xid when 
> > > > > > > we
> > > > > > > are skipping changes?
> > > > > >
> > > > > > Oh I thought we wanted to keep holding the lock while skipping 
> > > > > > changes
> > > > > > (changing skip_xid requires acquiring the lock).
> > > > > >
> > > > > > So if skip_xid is already changed, the apply worker would do
> > > > > > replorigin_advance() with WAL logging, instead of committing the
> > > > > > catalog change?
> > > > > >
> > > > >
> > > > > Right. BTW, how are you planning to advance the origin? Normally, a
> > > > > commit transaction would do it but when we are skipping all changes,
> > > > > the commit might not do it as there won't be any transaction id
> > > > > assigned.
> > > >
> > > > I've not tested it yet but replorigin_advance() with wal_log = true
> > > > seems to work for this case.
> > >
> > > I've tested it and realized that we cannot use replorigin_advance()
> > > for this purpose without changes. That is, the current
> > > replorigin_advance() doesn't allow to advance the origin by the owner:
> > >
> > > /* Make sure it's not used by somebody else */
> > > if (replication_state->acquired_by != 0)
> > > {
> > > ereport(ERROR,
> > > (errcode(ERRCODE_OBJECT_IN_USE),
> > >  errmsg("replication origin with OID %d is already
> > > active for PID %d",
> > > replication_state->roident,
> > > replication_state->acquired_by)));
> > > }
> > >
> > > So we need to change it so that the origin owner can advance its
> > > origin, which makes sense to me.
> > >
> > > Also, when we have to update the origin instead of committing the
> > > catalog change while updating the origin, we cannot record the origin
> > > timestamp.
> > >
> >
> > Is it because we currently update the origin timestamp with commit record?
>
> Yes.
>
> >
> > > This behavior makes sense to me because we skipped the
> > > transaction. But ISTM it’s not good if we emit the origin timestamp
> > > only when directly updating the origin. So probably we need to always
> > > omit origin timestamp.
> > >
> >
> > Do you mean to say that you want to omit it even when we are
> > committing the changes?
>
> Yes, it would be better to record only origin lsn in terms of consistency.
>
> >
> > > Apart from that, I'm vaguely concerned that the logic seems to be
> > > getting complex. Probably it comes from the fact that we store
> > > skip_xid in the catalog and update the catalog to clear/set the
> > > skip_xid. It might be worth revisiting the idea of storing skip_xid on
> > > shmem (e.g., ReplicationState)?
> > >
> >
> > IIRC, the problem with that idea was that we won't remember skip_xid
> > information after server restart and the user won't even know that it
> > has to set it again.
>
> Right, I agree that it’s not convenient when the server restarts or
> crashes, but these problems could not be critical in the situation
> where users have to use this feature; the subscriber already entered
> an error loop so they can know xid again and it’s an uncommon case
> that they need to restart during skipping changes.
>
> Anyway, I'll submit an updated patch soon so we can discuss complexity
> vs. convenience.

Attached an updated patch. Please review it.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v2-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
Description: Binary data


Re: RFC: Logging plan of the running query

2022-01-06 Thread torikoshia

On 2021-11-26 12:39, torikoshia wrote:
Since the patch could not be applied to the HEAD anymore, I also 
updated it.


Updated the patch for fixing compiler warning about the format on 
windows.



--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 7 Jan 2022 12:31:03 +0900
Subject: [PATCH v15] Add function to log the untruncated query string and its
 plan for the query currently running on the backend with the specified
 process ID.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Since some codes, tests and comments of
pg_log_query_plan() are the same with
pg_log_backend_memory_contexts(), this patch also refactors
them to make them common.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby

---
 doc/src/sgml/func.sgml   |  45 +++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 117 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/ipc/signalfuncs.c|  55 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   7 ++
 src/backend/utils/adt/mcxtfuncs.c|  36 +-
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/storage/signalfuncs.h|  22 
 src/include/tcop/pquery.h|   1 +
 src/test/regress/expected/misc_functions.out |  54 +++--
 src/test/regress/sql/misc_functions.sql  |  42 +--
 19 files changed, 355 insertions(+), 63 deletions(-)
 create mode 100644 src/include/storage/signalfuncs.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e58efce586..9804574c10 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25430,6 +25430,26 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.
+They will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+   
+  
+
   

 
@@ -25543,6 +25563,31 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+
+LOG:  plan of the query running on backend with PID 17793 is:
+Query Text: SELECT * FROM pgbench_accounts;
+Seq Scan on public.pgbench_accounts  (cost=0.00..52787.00 rows=200 width=97)
+  Output: aid, bid, abalance, filler
+Settings: work_mem = '1MB'
+
+Note that nested statements (statements executed inside a function) are not
+considered for logging. Only the plan of the most deeply nested query is logged.
+   
+
   
 
   
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 3a4fa9091b..173e268be3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
 
 REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
 
+REVOKE 

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-06 Thread Fujii Masao




On 2022/01/06 19:24, Simon Riggs wrote:

On Thu, 30 Dec 2021 at 13:19, Maxim Orlov  wrote:


Your opinions are very much welcome!


This is a review of the Int64 options patch,
"v6-0001-Add-64-bit-GUCs-for-xids.patch"


Do we really need to support both int32 and int64 options? Isn't it enough to 
replace the existing int32 option with int64 one? Or how about using 
string-type option for very large number like 64-bit XID, like it's done for 
recovery_target_xid?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: Support tab completion for upper character inputs in psql

2022-01-06 Thread tanghy.f...@fujitsu.com
On Friday, January 7, 2022 1:08 PM, Japin Li  wrote:
> +/*
> + * pg_string_tolower - Fold a string to lower case if the string is not 
> quoted
> + * and only contains ASCII characters.
> + * For German/Turkish etc text, no change will be made.
> + *
> + * The returned value has to be freed.
> + */
> +static char *
> +pg_string_tolower_if_ascii(const char *text)
> +{
> 
> s/pg_string_tolower/pg_string_tolower_if_ascii/ for comments.
> 

Thanks for your review.
Comment fixed in the attached V11 patch.

Regards,
Tang


v11-0001-Support-tab-completion-with-a-query-result-for-u.patch
Description:  v11-0001-Support-tab-completion-with-a-query-result-for-u.patch


Re: Map WAL segment files on PMEM as WAL buffers

2022-01-06 Thread Justin Pryzby
On Fri, Jan 07, 2022 at 12:50:01PM +0900, Takashi Menjo wrote:
> > But in this case it really doesn't work :(
> >
> > running bootstrap script ... 2022-01-05 23:17:30.244 CST [12088] FATAL:  
> > file not on PMEM: path "pg_wal/00010001"
> 
> Do you have a real PMEM device such as NVDIMM-N or Intel Optane PMem?

No - the point is that we'd like to have a way to exercise this patch on the
cfbot.  Particularly the new code introduced by this patch, not just the
--without-pmem case...

I was able to make this pass "make check" by adding this to main() in
src/backend/main/main.c:
| setenv("PMEM_IS_PMEM_FORCE", "1", 0);

I think you should add a patch which does what Thomas suggested: 1) add to
./.cirrus.yaml installation of the libpmem package for debian/bsd/mac/windows;
2) add setenv to main(), as above; 3) change configure.ac and guc.c to default
to --with-libpmem and wal_pmem_map=on.  This should be the last patch, for
cfbot only, not meant to be merged.

You can test that the package installation part works before mailing patches to
the list with the instructions here:

src/tools/ci/README:
Enabling cirrus-ci in a github repository..

> If you don't, you have two alternatives below. Note that neither of
> them ensures durability. Each of them is just for testing.
> 2. Set the environment variable PMEM_IS_PMEM_FORCE=1 to tell libpmem
> to treat any devices as if they were PMEM.

The next revision should surely squish all the fixes into their corresponding
patches to be fixed.  Each of the patches ought to be compile and pass tests
without depending on the "following" patches:  0001 without 0002-, 0001-0002
without 0003-, etc.

-- 
Justin




Re: row filtering for logical replication

2022-01-06 Thread Peter Smith
On Wed, Jan 5, 2022 at 1:05 PM wangw.f...@fujitsu.com
 wrote:
>
> On Thu, Jan 4, 2022 at 00:54 PM Peter Smith  wrote:
> > Modified in v58 [1] as suggested
> Thanks for updating the patches.
> A few comments about v58-0001 and v58-0002.
>
> v58-0001
> 1.
> How about modifying the following loop in copy_table by using for_each_from
> instead of foreach?
> Like the invocation of for_each_from in function get_rule_expr.
> from:
> if (qual != NIL)
> {
> ListCell   *lc;
> boolfirst = true;
>
> appendStringInfoString(, " WHERE ");
> foreach(lc, qual)
> {
> char   *q = strVal(lfirst(lc));
>
> if (first)
> first = false;
> else
> appendStringInfoString(, " OR ");
> appendStringInfoString(, q);
> }
> list_free_deep(qual);
> }
> change to:
> if (qual != NIL)
> {
> ListCell   *lc;
> char   *q = strVal(linitial(qual));
>
> appendStringInfo(, " WHERE %s", q);
> for_each_from(lc, qual, 1)
> {
> q = strVal(lfirst(lc));
> appendStringInfo(, " OR %s", q);
> }
> list_free_deep(qual);
> }
>
> 2.
> I find the API of get_rel_sync_entry is modified.
> -get_rel_sync_entry(PGOutputData *data, Oid relid)
> +get_rel_sync_entry(PGOutputData *data, Relation relation)
> It looks like just moving the invocation of RelationGetRelid from outside into
> function get_rel_sync_entry. I am not sure whether this modification is
> necessary to this feature or not.
>
> v58-0002
> 1.
> In function pgoutput_row_filter_init, if no_filter is set, I think we do not
> need to add row filter to list(rfnodes).
> So how about changing three conditions when add row filter to rfnodes like 
> this:
> -   if (pub->pubactions.pubinsert)
> +   if (pub->pubactions.pubinsert && 
> !no_filter[idx_ins])
> {
> rfnode = 
> stringToNode(TextDatumGetCString(rfdatum));
> rfnodes[idx_ins] = 
> lappend(rfnodes[idx_ins], rfnode);
> }
>

I think currently there is no harm done with the current code because
even there was no_filter[xxx] then any gathered rfnodes[xxx] will be
later cleaned up and ignored anyway, so this change is not really
necessary.

OTOH your suggestion could be a tiny bit more efficient for some cases
if there are many publications. so LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Skipping logical replication transactions on subscriber side

2022-01-06 Thread Amit Kapila
On Fri, Jan 7, 2022 at 6:35 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila  wrote:
> >
> > On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila 
> > > > > >  wrote:
> > > > > > >
> > > > > > > I thought we just want to lock before clearing the skip_xid 
> > > > > > > something
> > > > > > > like take the lock, check if the skip_xid in the catalog is the 
> > > > > > > same
> > > > > > > as we have skipped, if it is the same then clear it, otherwise, 
> > > > > > > leave
> > > > > > > it as it is. How will that disallow users to change skip_xid when 
> > > > > > > we
> > > > > > > are skipping changes?
> > > > > >
> > > > > > Oh I thought we wanted to keep holding the lock while skipping 
> > > > > > changes
> > > > > > (changing skip_xid requires acquiring the lock).
> > > > > >
> > > > > > So if skip_xid is already changed, the apply worker would do
> > > > > > replorigin_advance() with WAL logging, instead of committing the
> > > > > > catalog change?
> > > > > >
> > > > >
> > > > > Right. BTW, how are you planning to advance the origin? Normally, a
> > > > > commit transaction would do it but when we are skipping all changes,
> > > > > the commit might not do it as there won't be any transaction id
> > > > > assigned.
> > > >
> > > > I've not tested it yet but replorigin_advance() with wal_log = true
> > > > seems to work for this case.
> > >
> > > I've tested it and realized that we cannot use replorigin_advance()
> > > for this purpose without changes. That is, the current
> > > replorigin_advance() doesn't allow to advance the origin by the owner:
> > >
> > > /* Make sure it's not used by somebody else */
> > > if (replication_state->acquired_by != 0)
> > > {
> > > ereport(ERROR,
> > > (errcode(ERRCODE_OBJECT_IN_USE),
> > >  errmsg("replication origin with OID %d is already
> > > active for PID %d",
> > > replication_state->roident,
> > > replication_state->acquired_by)));
> > > }
> > >
> > > So we need to change it so that the origin owner can advance its
> > > origin, which makes sense to me.
> > >
> > > Also, when we have to update the origin instead of committing the
> > > catalog change while updating the origin, we cannot record the origin
> > > timestamp.
> > >
> >
> > Is it because we currently update the origin timestamp with commit record?
>
> Yes.
>
> >
> > > This behavior makes sense to me because we skipped the
> > > transaction. But ISTM it’s not good if we emit the origin timestamp
> > > only when directly updating the origin. So probably we need to always
> > > omit origin timestamp.
> > >
> >
> > Do you mean to say that you want to omit it even when we are
> > committing the changes?
>
> Yes, it would be better to record only origin lsn in terms of consistency.
>

I am not so sure about this point because then what purpose origin
timestamp will serve in the code.

> >
> > > Apart from that, I'm vaguely concerned that the logic seems to be
> > > getting complex. Probably it comes from the fact that we store
> > > skip_xid in the catalog and update the catalog to clear/set the
> > > skip_xid. It might be worth revisiting the idea of storing skip_xid on
> > > shmem (e.g., ReplicationState)?
> > >
> >
> > IIRC, the problem with that idea was that we won't remember skip_xid
> > information after server restart and the user won't even know that it
> > has to set it again.
>
> Right, I agree that it’s not convenient when the server restarts or
> crashes, but these problems could not be critical in the situation
> where users have to use this feature; the subscriber already entered
> an error loop so they can know xid again and it’s an uncommon case
> that they need to restart during skipping changes.
>
> Anyway, I'll submit an updated patch soon so we can discuss complexity
> vs. convenience.
>

Okay, that sounds reasonable.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-01-06 Thread Amit Kapila
On Thu, Jan 6, 2022 at 6:42 PM Euler Taveira  wrote:
>
> On Thu, Jan 6, 2022, at 1:18 AM, Amit Kapila wrote:
>
> On Thu, Jan 6, 2022 at 8:43 AM Peter Smith  wrote:
> >
> > On Wed, Jan 5, 2022 at 9:52 PM Amit Kapila  wrote:
> > >
> > ...
> >
> > > Another minor comment:
> > > +static bool pgoutput_row_filter(enum ReorderBufferChangeType changetype,
> > >
> > > Do we need to specify the 'enum' type before changetype parameter?
> > >
> >
> > That is because there is currently no typedef for the enum
> > ReorderBufferChangeType.
> >
>
> But I see that the 0002 patch is already adding the required typedef.
>
> IMO we shouldn't reuse ReorderBufferChangeType. For a long-term solution, it 
> is
> fragile. ReorderBufferChangeType has values that do not matter for row filter
> and it relies on the fact that REORDER_BUFFER_CHANGE_INSERT,
> REORDER_BUFFER_CHANGE_UPDATE and REORDER_BUFFER_CHANGE_DELETE are the first 3
> values from the enum, otherwise, it breaks rfnodes and no_filters in
> pgoutput_row_filter().
>

I think you mean to say it will break in pgoutput_row_filter_init(). I
see your point but OTOH, if we do what you are suggesting then don't
we need an additional mapping between ReorderBufferChangeType and
RowFilterPublishAction as row filter and pgoutput_change API need to
use those values.

-- 
With Regards,
Amit Kapila.




Re: Support tab completion for upper character inputs in psql

2022-01-06 Thread Japin Li


On Fri, 07 Jan 2022 at 10:12, tanghy.f...@fujitsu.com  
wrote:
> On Thursday, January 6, 2022 11:57 PM, Tom Lane  wrote:
>>
>> Peter Eisentraut  writes:
>> > So the ordering of the suggested completions is different.  I don't know
>> > offhand how that ordering is determined.  Perhaps it's dependent on
>> > locale, readline version, or operating system.  In any case, we need to
>> > figure this out to make this test stable.
>>
>> I don't think we want to get into the business of trying to make that
>> consistent across different readline/libedit versions.  How about
>> adjusting the test case so that only one enum value is to be printed?
>>
>
> Thanks for your suggestion. Agreed.
> Fixed the test case to show only one enum value.
>

+/*
+ * pg_string_tolower - Fold a string to lower case if the string is not quoted
+ * and only contains ASCII characters.
+ * For German/Turkish etc text, no change will be made.
+ *
+ * The returned value has to be freed.
+ */
+static char *
+pg_string_tolower_if_ascii(const char *text)
+{

s/pg_string_tolower/pg_string_tolower_if_ascii/ for comments.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Map WAL segment files on PMEM as WAL buffers

2022-01-06 Thread Takashi Menjo
Hi Justin,

Thank you for your build test and comments. The v7 patchset attached
to this email fixes the issues you reported.


> The cfbot showed issues compiling on linux and windows.
> http://cfbot.cputube.org/takashi-menjo.html
>
> https://cirrus-ci.com/task/6125740327436288
> [02:30:06.538] In file included from xlog.c:38:
> [02:30:06.538] ../../../../src/include/access/xlogpmem.h:32:42: error: 
> unknown type name ‘tli’
> [02:30:06.538]32 | PmemXLogEnsurePrevMapped(XLogRecPtr ptr, tli)
> [02:30:06.538]   |  ^~~
> [02:30:06.538] xlog.c: In function ‘GetXLogBuffer’:
> [02:30:06.538] xlog.c:1959:19: warning: implicit declaration of function 
> ‘PmemXLogEnsurePrevMapped’ [-Wimplicit-function-declaration]
> [02:30:06.538]  1959 |openLogSegNo = PmemXLogEnsurePrevMapped(endptr, 
> tli);
>
> https://cirrus-ci.com/task/6688690280857600?logs=build#L379
> [02:33:25.752] c:\cirrus\src\include\access\xlogpmem.h(33,1): error C2081: 
> 'tli': name in formal parameter list illegal (compiling source file 
> src/backend/access/transam/xlog.c) [c:\cirrus\postgres.vcxproj]
>
> I'm attaching a probable fix.  Unfortunately, for patches like this, most of
> the functionality isn't exercised unless the library is installed and
> compilation and runtime are enabled by default.

I got the same error when without --with-libpmem. Your fix looks
reasonable. My v7-0008 fixes this error.


> In 0009: recaluculated => recalculated

v7-0011 fixes this typo.


> 0010-Update-document should be squished with 0003-Add-wal_pmem_map-to-GUC (and
> maybe 0002 and 0001).  I believe the patches after 0005 are more WIP, so it's
> fine if they're not squished yet.

As you say, the patch updating document should melt into a related
fix, probably "Add wal_pmem_map to GUC".  For now I want it to be a
separate patch (v7-0014).


> I'm not sure what the point is of this one: 
> 0008-Let-wal_pmem_map-be-constant-unl

If USE_LIBPMEM is not defined (that is, no --with-libpmem),
wal_pmem_map is always false and is never used essentially. Using
#if(n)def everywhere is not good for code readability, so I let
wal_pmem_map be constant. This may help compilers optimize conditional
branches.

v7-0005 adds the comment above.


> +   ereport(ERROR,
> +   (errcode_for_file_access(),
> +errmsg("could not pmem_map_file \"%s\": %m", 
> path)));
>
> => The outer parenthesis are not needed since e3a87b4.

v7-0009 fixes this.


> But in this case it really doesn't work :(
>
> running bootstrap script ... 2022-01-05 23:17:30.244 CST [12088] FATAL:  file 
> not on PMEM: path "pg_wal/00010001"

Do you have a real PMEM device such as NVDIMM-N or Intel Optane PMem?
If so, please use a PMEM mounted with Filesystem DAX option for
pg_wal, or the FATAL error will occur.

If you don't, you have two alternatives below. Note that neither of
them ensures durability. Each of them is just for testing.

1. Emulate PMEM with memmap=nn[KMG]!ss[KMG]. This can be used only on
Linux. Please see [1][2] for details; or
2. Set the environment variable PMEM_IS_PMEM_FORCE=1 to tell libpmem
to treat any devices as if they were PMEM.


Regards,
Takashi


[1] 
https://www.intel.com/content/www/us/en/developer/articles/training/how-to-emulate-persistent-memory-on-an-intel-architecture-server.html
[2] https://nvdimm.wiki.kernel.org/

-- 
Takashi Menjo 


v7-0004-Let-wal_pmem_map-be-constant-unless-with-libpmem.patch
Description: Binary data


v7-0002-Support-build-with-MSVC-on-Windows.patch
Description: Binary data


v7-0001-Add-with-libpmem-option-for-PMEM-support.patch
Description: Binary data


v7-0005-Comment-for-constant-wal_pmem_map.patch
Description: Binary data


v7-0003-Add-wal_pmem_map-to-GUC.patch
Description: Binary data


v7-0006-Export-InstallXLogFileSegment.patch
Description: Binary data


v7-0008-Fix-invalid-declaration-of-PmemXLogEnsurePrevMapp.patch
Description: Binary data


v7-0007-Map-WAL-segment-files-on-PMEM-as-WAL-buffers.patch
Description: Binary data


v7-0009-Remove-redundant-parentheses-from-ereport-call.patch
Description: Binary data


v7-0011-Fix-typo-in-comment.patch
Description: Binary data


v7-0013-WAL-statistics-in-cases-of-wal_pmem_map-true.patch
Description: Binary data


v7-0012-Compatible-to-Windows.patch
Description: Binary data


v7-0010-Ensure-WAL-mappings-before-assertion.patch
Description: Binary data


v7-0014-Update-document.patch
Description: Binary data


v7-0015-Preallocate-and-initialize-more-WAL-if-wal_pmem_m.patch
Description: Binary data


Re: Deduplicate min restart_lsn calculation code

2022-01-06 Thread Bharath Rupireddy
On Thu, Jan 6, 2022 at 11:54 PM Alvaro Herrera  wrote:
>
> On 2022-Jan-06, Bharath Rupireddy wrote:
>
> > Hi,
> >
> > It seems like the two functions ReplicationSlotsComputeRequiredLSN and
> > ReplicationSlotsComputeLogicalRestartLSN more or less does the same
> > thing which makes me optimize (saving 40 LOC) it as attached. I'm
> > pretty much okay if it gets rejected on the grounds that it creates a
> > lot of diff with the older versions and the new API may not look
> > nicer, still I want to give it a try.
> >
> > Thoughts?
>
> Hmm, it seems sensible to me.  But I would not have the second boolean
> argument in the new function, and instead have the caller save the
> return value in a local variable to do the XLogSetReplicationSlotMinimumLSN
> step separately.  Then the new function API is not so strange.

Thanks for taking a look at it. Here's the v2 patch, please review.

Regards,
Bharath Rupireddy.


v2-0001-deduplicate-min-restart_lsn-calculation-code.patch
Description: Binary data


Re: Bugs in pgoutput.c

2022-01-06 Thread Amit Kapila
On Fri, Jan 7, 2022 at 1:28 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
>
> > + * *during* a callback if we do any syscache or table access in the
> > + * callback.
>
> > As we don't take locks on tables, can invalidation events be accepted
> > during table access? I could be missing something but I see relation.c
> > accepts invalidation messages only when lock mode is not 'NoLock'.
>
> The core point here is that you're assuming that NO code path taken
> during logical decoding would try to take a lock.  I don't believe it,
> at least not unless you can point me to some debugging cross-check that
> guarantees it.
>

AFAIK, currently, there is no such debugging cross-check in locking
APIs but we can add one to ensure that we don't acquire lock on
user-defined tables during logical decoding. As pointed by Robert, I
also don't think accessing user tables will work during logical
decoding.

> Given that we're interested in historic not current snapshots, I can
> buy that it might be workable to manage syscache invalidations totally
> differently than the way it's done in normal processing, in which case
> (*if* it's done like that) maybe no invals would need to be recognized
> while an output plugin is executing.  But (a) the comment here is
> entirely wrong if that's so, and (b) I don't see anything in inval.c
> that makes it work differently.
>

I think we need invalidations to work in output plugin to ensure that
the RelationSyncEntry has correct information.

-- 
With Regards,
Amit Kapila.




Re: Add spin_delay() implementation for Arm in s_lock.h

2022-01-06 Thread Andres Freund
On 2022-01-06 22:23:38 -0500, Tom Lane wrote:
> No; there's just one spinlock.  I'm re-purposing the spinlock that
> test_shm_mq uses to protect its setup operations (and thereafter
> ignores).

Oh, sorry, misread :(


> AFAICS the N+1 shm_mq instances don't internally contain
> spinlocks; they all use atomic ops.

They contain spinlocks too, and the naming is similar enough that I got
confused:
struct shm_mq
{
slock_t mq_mutex;

We don't use them for all that much anymore though...

Greetings,

Andres Freund




Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-06 Thread Alexander Korotkov
,On Thu, Jan 6, 2022 at 4:15 PM Finnerty, Jim  wrote:
> (Maxim) Re: -- If after upgrade page has no free space for special data, 
> tuples are
>converted to "double xmax" format: xmin became virtual
>FrozenTransactionId, xmax occupies the whole 64bit.
>Page converted to new format when vacuum frees enough space.
>
> A better way would be to prepare the database for conversion to the 64-bit 
> XID format before the upgrade so that it ensures that every page has enough 
> room for the two new epochs (E bits).
>
> 1. Enforce the rule that no INSERT or UPDATE to an existing page will leave 
> less than E bits of free space on a heap page
>
> 2. Run an online and restartable task, analogous to pg_repack, that rewrites 
> and splits any page that has less than E bits of free space. This needs to be 
> run on all non-temp tables in all schemas in all databases.  DDL operations 
> are not allowed on a target table while this operation runs, which is 
> enforced by taking an ACCESS SHARE lock on each table while the process is 
> running. To mitigate the effects of this restriction, the restartable task 
> can be restricted to run only in certain hours.  This could be implemented as 
> a background maintenance task that runs for X hours as of a certain time of 
> day and then kicks itself off again in 24-X hours, logging its progress.
>
> When this task completes, the database is ready for upgrade to 64-bit XIDs, 
> and there is no possibility that any page has insufficient free space for the 
> special data.
>
> Would you agree that this approach would completely eliminate the need for a 
> "double xmax" representation?

The "prepare" approach was the first tried.
https://github.com/postgrespro/pg_pageprep
But it appears to be very difficult and unreliable.  After investing
many months into pg_pageprep, "double xmax" approach appears to be
very fast to implement and reliable.

--
Regards,
Alexander Korotkov




Re: [PATCH] Add extra statistics to explain for Nested Loop

2022-01-06 Thread Lukas Fittl
On Sun, Nov 21, 2021 at 8:55 PM Justin Pryzby  wrote:

> I'm curious to hear what you and others think of the refactoring.
>
> It'd be nice if there's a good way to add a test case for verbose output
> involving parallel workers, but the output is unstable ...
>

I've reviewed this patch, and it works as expected - the refactoring
changes by Justin also appear to make sense to me.

I've briefly thought whether this needs documentation (currently the patch
includes none), but there does not appear to be a good place to add
documentation about this from a quick glance, so it seems acceptable to
leave this out given the lack of more detailed EXPLAIN documentation in
general.

The one item that still feels a bit open to me is benchmarking, based on
Andres' comment a while ago:

On Mon, Oct 19, 2020 at 4:20 PM Andres Freund  wrote:

> I'm a bit worried that further increasing the size of struct
> Instrumentation will increase the overhead of EXPLAIN ANALYZE further -
> in some workloads we spend a fair bit of time in code handling that. It
> would be good to try to find a few bad cases, and see what the overhead is.
>

Whilst no specific bad cases were provided, I wonder if even a simple
pgbench with auto_explain (and log_analyze=1) would be a way to test this?

The overhead of the Instrumentation struct size should show regardless of
whether a plan actually includes a Nested Loop.

Thanks,
Lukas

-- 
Lukas Fittl


Re: Add spin_delay() implementation for Arm in s_lock.h

2022-01-06 Thread Tom Lane
Andres Freund  writes:
> These separate shm_mq instances forward messages in a circle,
> "leader"->worker_1->worker_2->...->"leader". So there isn't a single contended
> spinlock, but a bunch of different spinlocks, each with at most two backends
> accessing it?

No; there's just one spinlock.  I'm re-purposing the spinlock that
test_shm_mq uses to protect its setup operations (and thereafter
ignores).  AFAICS the N+1 shm_mq instances don't internally contain
spinlocks; they all use atomic ops.

(Well, on crappy architectures maybe there's spinlocks underneath
the atomic ops, but I don't think we care about such cases here.)

regards, tom lane




Re: Add spin_delay() implementation for Arm in s_lock.h

2022-01-06 Thread Andres Freund
Hi,

On 2022-01-06 21:39:57 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I wonder if this will show the full set of spinlock contention issues - 
> > isn't
> > this only causing contention for one spinlock between two processes?
> 
> I don't think so -- the point of using the "pipelined" variant is
> that messages are passing between all N worker processes concurrently.
> (With the proposed test, I see N processes all pinning their CPUs;
> if I use the non-pipelined API, they are busy but nowhere near 100%.)

My understanding of the shm_mq code is that that ends up with N shm_mq
instances, one for each worker. After all:

> * shm_mq.c
> *   single-reader, single-writer shared memory message queue


These separate shm_mq instances forward messages in a circle,
"leader"->worker_1->worker_2->...->"leader". So there isn't a single contended
spinlock, but a bunch of different spinlocks, each with at most two backends
accessing it?


> It is just one spinlock, true, but I think the point is to gauge
> what happens with N processes all contending for the same lock.
> We could add some more complexity to use multiple locks, but
> does that really add anything but complexity?

Right, I agree that that's what we shoudl test - it's just no immediately
obvious to me that we are.

Greetings,

Andres Freund




Re: Add spin_delay() implementation for Arm in s_lock.h

2022-01-06 Thread Tom Lane
Andres Freund  writes:
>> I landed on the idea of adding some intentional spinlock
>> contention to src/test/modules/test_shm_mq, which is a prefab test
>> framework for passing data among multiple worker processes.  The
>> attached quick-hack patch makes it grab and release a spinlock once
>> per passed message.

> I wonder if this will show the full set of spinlock contention issues - isn't
> this only causing contention for one spinlock between two processes?

I don't think so -- the point of using the "pipelined" variant is
that messages are passing between all N worker processes concurrently.
(With the proposed test, I see N processes all pinning their CPUs;
if I use the non-pipelined API, they are busy but nowhere near 100%.)

It is just one spinlock, true, but I think the point is to gauge
what happens with N processes all contending for the same lock.
We could add some more complexity to use multiple locks, but
does that really add anything but complexity?

regards, tom lane




Re: Add spin_delay() implementation for Arm in s_lock.h

2022-01-06 Thread Andres Freund
Hi,

> I landed on the idea of adding some intentional spinlock
> contention to src/test/modules/test_shm_mq, which is a prefab test
> framework for passing data among multiple worker processes.  The
> attached quick-hack patch makes it grab and release a spinlock once
> per passed message.

I wonder if this will show the full set of spinlock contention issues - isn't
this only causing contention for one spinlock between two processes? It's not
too hard to imagine delays being more important the more processes contend for
one cacheline.  I only skimmed your changes, so I might also just have
misunderstood what you were doing...

Greetings,

Andres Freund




Re: \dP and \dX use ::regclass without "pg_catalog."

2022-01-06 Thread Justin Pryzby
https://www.postgresql.org/message-id/flat/7ad8cd13-db5b-5cf6-8561-dccad1a934cb%40nttcom.co.jp
https://www.postgresql.org/message-id/flat/20210827193151.GN26465%40telsasoft.com

On Sat, Aug 28, 2021 at 08:57:32AM -0400, Álvaro Herrera wrote:
> On 2021-Aug-27, Justin Pryzby wrote:
> > I noticed that for \dP+ since 1c5d9270e, regclass is written without
> > "pg_catalog." (Alvaro and I failed to notice it in 421a2c483, too).
> 
> Oops, will fix shortly.

On Tue, Sep 28, 2021 at 04:25:11PM +0200, Magnus Hagander wrote:
> On Mon, Sep 27, 2021 at 2:14 AM Tatsuro Yamada 
>  wrote:
> > I found other functions that we should add "pg_catalog" prefix in
> > describe.c. This fix is similar to the following commit.
> 
> Yup, that's correct. Applied and backpatched to 14 (but it won't be in the
> 14.0 release).

Those two issues were fixed in 1f092a309 and 07f8a9e784.
But, we missed two casts to ::text which don't use pg_catalog.
Evidently the cast is to allow stable sorting.

I improved on my previous hueristic to look for these ; this finds the two
missing schema qualifiers:

> time grep "$(sed -r "/.*pg_catalog\\.([_[:alpha:]]+).*/! d; s//\\1/; 
> /^(char|oid)$/d; s/.*/[^. ']<&>/" src/bin/psql/describe.c |sort -u)" 
> src/bin/psql/describe.c

While looking at that, I wondered why describeOneTableDetails lists stats
objects in order of OID ?  Dating back to 7b504eb28, and, before that,
554a73a6.2060...@2ndquadrant.com.  It should probably order by nsp, stxname.




Re: Add index scan progress to pg_stat_progress_vacuum

2022-01-06 Thread Imseih (AWS), Sami
Thanks for the review.

I am hesitant to make column name changes for obvious reasons, as it breaks 
existing tooling. However, I think there is a really good case to change 
"index_vacuum_count" as the name is confusing. "index_vacuum_cycles_completed" 
is the name I suggest if we agree to rename.

For the new column, "num_indexes_to_vacuum" is good with me. 

As far as  max_index_vacuum_cycle_time goes, Besides the points you make, 
another reason is that until one cycle completes, this value will remain at 0. 
It will not be helpful data for most vacuum cases. Removing it also reduces the 
complexity of the patch. 




On 1/6/22, 2:41 PM, "Bossart, Nathan"  wrote:

On 12/29/21, 8:44 AM, "Imseih (AWS), Sami"  wrote:
> In "pg_stat_progress_vacuum", introduce 2 columns:
>
> * total_index_vacuum : This is the # of indexes that will be vacuumed. 
Keep in mind that if failsafe mode kicks in mid-flight to the vacuum, Postgres 
may choose to forgo index scans. This value will be adjusted accordingly.
> * max_index_vacuum_cycle_time : The total elapsed time for a index vacuum 
cycle is calculated and this value will be updated to reflect the longest 
vacuum cycle. Until the first cycle completes, this value will be 0. The 
purpose of this column is to give the user an idea of how long an index vacuum 
cycle takes to complete.

I think that total_index_vacuum is a good thing to have.  I would
expect this to usually just be the number of indexes on the table, but
as you pointed out, this can be different when we are skipping
indexes.  My only concern with this new column is the potential for
confusion when compared with the index_vacuum_count value.
index_vacuum_count indicates the number of vacuum cycles completed,
but total_index_vacuum indicates the number of indexes that will be
vacuumed.  However, the names sound like they could refer to the same
thing to me.  Perhaps we should rename index_vacuum_count to
index_vacuum_cycles/index_vacuum_cycle_count, and the new column
should be something like num_indexes_to_vacuum or index_vacuum_total.

I don't think we need the max_index_vacuum_cycle_time column.  While
the idea is to give users a rough estimate for how long an index cycle
will take, I don't think it will help generate any meaningful
estimates for how much longer the vacuum operation will take.  IIUC we
won't have any idea how many total index vacuum cycles will be needed.
Even if we did, the current cycle could take much more or much less
time.  Also, none of the other progress views seem to provide any
timing information, which I suspect is by design to avoid inaccurate
estimates.

> Introduce a new view called "pg_stat_progress_vacuum_index". This view 
will track the progress of a worker ( or leader PID ) while it's vacuuming an 
index. It will expose some key columns:
>
> * pid: The PID of the worker process
>
> * leader_pid: The PID of the leader process. This is the column that can 
be joined with "pg_stat_progress_vacuum". leader_pid and pid can have the same 
value as a leader can also perform an index vacuum.
>
> * indrelid: The relid of the index currently being vacuumed
>
> * vacuum_cycle_ordinal_position: The processing position of the index 
being vacuumed. This can be useful to determine how many indexes out of the 
total indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed
>
> * index_tuples_vacuumed: This is the number of index tuples vacuumed for 
the index overall. This is useful to show that the vacuum is actually doing 
work, as the # of tuples keeps increasing. 

Should we also provide some information for determining the progress
of the current cycle?  Perhaps there should be an
index_tuples_vacuumed_current_cycle column that users can compare with
the num_dead_tuples value in pg_stat_progress_vacuum.  However,
perhaps the number of tuples vacuumed in the current cycle can already
be discovered via index_tuples_vacuumed % max_dead_tuples.

+void
+rusage_adjust(const PGRUsage *ru0, PGRUsage *ru1)
+{
+   if (ru1->tv.tv_usec < ru0->tv.tv_usec)
+   {
+   ru1->tv.tv_sec--;
+   ru1->tv.tv_usec += 100;
+   }
+   if (ru1->ru.ru_stime.tv_usec < ru0->ru.ru_stime.tv_usec)
+   {
+   ru1->ru.ru_stime.tv_sec--;
+   ru1->ru.ru_stime.tv_usec += 100;
+   }
+   if (ru1->ru.ru_utime.tv_usec < ru0->ru.ru_utime.tv_usec)
+   {
+   ru1->ru.ru_utime.tv_sec--;
+   ru1->ru.ru_utime.tv_usec += 100;
+   }
+}

I think this function could benefit from a comment.  Without going
through it line by line, it is not clear to me exactly what it is
doing.

I know we're still working on what exactly this stuff should look
like, but I would suggest adding the documentation changes 

RE: Support tab completion for upper character inputs in psql

2022-01-06 Thread tanghy.f...@fujitsu.com
On Thursday, January 6, 2022 11:57 PM, Tom Lane  wrote:
> 
> Peter Eisentraut  writes:
> > So the ordering of the suggested completions is different.  I don't know
> > offhand how that ordering is determined.  Perhaps it's dependent on
> > locale, readline version, or operating system.  In any case, we need to
> > figure this out to make this test stable.
>
> I don't think we want to get into the business of trying to make that
> consistent across different readline/libedit versions.  How about
> adjusting the test case so that only one enum value is to be printed?
> 

Thanks for your suggestion. Agreed. 
Fixed the test case to show only one enum value.

Regards,
Tang


v10-0001-Support-tab-completion-with-a-query-result-for-u.patch
Description:  v10-0001-Support-tab-completion-with-a-query-result-for-u.patch


Re: Add spin_delay() implementation for Arm in s_lock.h

2022-01-06 Thread Tom Lane
"Blake, Geoff"  writes:
> Hope everything is well going into the new year.  I'd like to pick this 
> discussion back up and your thoughts on the patch with the data I posted 2 
> weeks prior.  Is there more data that would be helpful?  Different setup?  
> Data on older versions of Postgresql to ascertain if it makes more sense on 
> versions before the large re-work of the snapshot algorithm that exhibited 
> quite a bit of synchronization contention?

I spent some time working on this.  I don't have a lot of faith in
pgbench as a measurement testbed for spinlock contention, because over
the years we've done a good job of getting rid of that in our main
code paths (both the specific change you mention, and many others).
After casting around a bit and thinking about writing a bespoke test
framework, I landed on the idea of adding some intentional spinlock
contention to src/test/modules/test_shm_mq, which is a prefab test
framework for passing data among multiple worker processes.  The
attached quick-hack patch makes it grab and release a spinlock once
per passed message.  I'd initially expected that this would show only
marginal changes, because you'd hope that a spinlock acquisition would
be reasonably cheap compared to shm_mq_receive plus shm_mq_send.
Turns out not though.

The proposed test case is

(1) patch test_shm_mq as below

(2) time this query:

SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, n);

for various values of "n" up to about how many cores you have.
(You'll probably need to bump up max_worker_processes.)

For context, on my Intel-based main workstation (8-core Xeon W-2245),
the time to do this with stock test_shm_mq is fairly level.
Reporting best-of-3 runs:

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 1);
Time: 1386.413 ms (00:01.386)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 4);
Time: 1302.503 ms (00:01.303)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 8);
Time: 1373.121 ms (00:01.373)

However, after applying the contention patch:

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 1);
Time: 1346.362 ms (00:01.346)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 4);
Time: 3313.490 ms (00:03.313)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 8);
Time: 7660.329 ms (00:07.660)

So this seems like (a) it's a plausible model for code that has
unoptimized spinlock contention, and (b) the effects are large
enough that you needn't fret too much about measurement noise.

I tried this out on a handy Apple M1 mini, which I concede
is not big iron but it's pretty decent aarch64 hardware.
With current HEAD's spinlock code, I get (again best-of-3):

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 1);
Time: 1630.255 ms (00:01.630)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 4);
Time: 3495.066 ms (00:03.495)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 8);
Time: 19541.929 ms (00:19.542)

With your spin-delay patch:

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 1);
Time: 1643.524 ms (00:01.644)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 4);
Time: 3404.625 ms (00:03.405)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 8);
Time: 19260.721 ms (00:19.261)

So I don't see a lot of reason to think your patch changes anything.
Maybe on something with more cores?

For grins I also tried this same test with the use-CAS-for-TAS patch
that was being discussed November before last, and it didn't
really show up as any improvement either:

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 1);
Time: 1608.642 ms (00:01.609)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 4);
Time: 3396.564 ms (00:03.397)

regression=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, 8);
Time: 20092.683 ms (00:20.093)

Maybe that's a little better in the uncontended (single-worker)
case, but it's worse at the high end.

I'm really curious to hear if this measurement method shows
any interesting improvements on your hardware.

regards, tom lane

diff --git a/src/test/modules/test_shm_mq/setup.c b/src/test/modules/test_shm_mq/setup.c
index e05e97c6de..d535dbe408 100644
--- a/src/test/modules/test_shm_mq/setup.c
+++ b/src/test/modules/test_shm_mq/setup.c
@@ -135,6 +135,7 @@ setup_dynamic_shared_memory(int64 queue_size, int nworkers,
 	hdr->workers_total = nworkers;
 	hdr->workers_attached = 0;
 	hdr->workers_ready = 0;
+	hdr->messages_xfred = 0;
 	shm_toc_insert(toc, 0, hdr);
 
 	/* Set up one message queue per worker, plus one. */
diff --git a/src/test/modules/test_shm_mq/test_shm_mq.h b/src/test/modules/test_shm_mq/test_shm_mq.h
index a666121834..ff35211811 100644
--- a/src/test/modules/test_shm_mq/test_shm_mq.h
+++ b/src/test/modules/test_shm_mq/test_shm_mq.h
@@ -32,6 +32,7 @@ typedef 

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-01-06 Thread Euler Taveira
On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote:
> After a quick glance, I didn't see an easy way to hold a session open
> while the test does other things.  If there isn't one, modifying
> backup_fs_hot() to work with non-exclusive mode might be more trouble
> than it is worth.
You can use IPC::Run to start psql in background. See examples in
src/test/recovery.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Skipping logical replication transactions on subscriber side

2022-01-06 Thread Masahiko Sawada
On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila  wrote:
>
> On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada  wrote:
> >
> > On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > > > On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila 
> > > > >  wrote:
> > > > > >
> > > > > > I thought we just want to lock before clearing the skip_xid 
> > > > > > something
> > > > > > like take the lock, check if the skip_xid in the catalog is the same
> > > > > > as we have skipped, if it is the same then clear it, otherwise, 
> > > > > > leave
> > > > > > it as it is. How will that disallow users to change skip_xid when we
> > > > > > are skipping changes?
> > > > >
> > > > > Oh I thought we wanted to keep holding the lock while skipping changes
> > > > > (changing skip_xid requires acquiring the lock).
> > > > >
> > > > > So if skip_xid is already changed, the apply worker would do
> > > > > replorigin_advance() with WAL logging, instead of committing the
> > > > > catalog change?
> > > > >
> > > >
> > > > Right. BTW, how are you planning to advance the origin? Normally, a
> > > > commit transaction would do it but when we are skipping all changes,
> > > > the commit might not do it as there won't be any transaction id
> > > > assigned.
> > >
> > > I've not tested it yet but replorigin_advance() with wal_log = true
> > > seems to work for this case.
> >
> > I've tested it and realized that we cannot use replorigin_advance()
> > for this purpose without changes. That is, the current
> > replorigin_advance() doesn't allow to advance the origin by the owner:
> >
> > /* Make sure it's not used by somebody else */
> > if (replication_state->acquired_by != 0)
> > {
> > ereport(ERROR,
> > (errcode(ERRCODE_OBJECT_IN_USE),
> >  errmsg("replication origin with OID %d is already
> > active for PID %d",
> > replication_state->roident,
> > replication_state->acquired_by)));
> > }
> >
> > So we need to change it so that the origin owner can advance its
> > origin, which makes sense to me.
> >
> > Also, when we have to update the origin instead of committing the
> > catalog change while updating the origin, we cannot record the origin
> > timestamp.
> >
>
> Is it because we currently update the origin timestamp with commit record?

Yes.

>
> > This behavior makes sense to me because we skipped the
> > transaction. But ISTM it’s not good if we emit the origin timestamp
> > only when directly updating the origin. So probably we need to always
> > omit origin timestamp.
> >
>
> Do you mean to say that you want to omit it even when we are
> committing the changes?

Yes, it would be better to record only origin lsn in terms of consistency.

>
> > Apart from that, I'm vaguely concerned that the logic seems to be
> > getting complex. Probably it comes from the fact that we store
> > skip_xid in the catalog and update the catalog to clear/set the
> > skip_xid. It might be worth revisiting the idea of storing skip_xid on
> > shmem (e.g., ReplicationState)?
> >
>
> IIRC, the problem with that idea was that we won't remember skip_xid
> information after server restart and the user won't even know that it
> has to set it again.

Right, I agree that it’s not convenient when the server restarts or
crashes, but these problems could not be critical in the situation
where users have to use this feature; the subscriber already entered
an error loop so they can know xid again and it’s an uncommon case
that they need to restart during skipping changes.

Anyway, I'll submit an updated patch soon so we can discuss complexity
vs. convenience.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-01-06 Thread Bossart, Nathan
On 12/2/21, 1:34 PM, "Bossart, Nathan"  wrote:
> On 12/2/21, 9:50 AM, "David Steele"  wrote:
>> On 12/2/21 12:38, David Steele wrote:
>>> On 12/2/21 11:00, Andrew Dunstan wrote:

 Should we really be getting rid of
 PostgreSQL::Test::Cluster::backup_fs_hot() ?
>>>
>>> Agreed, it would be better to update backup_fs_hot() to use exclusive
>>> mode and save out backup_label instead.
>>
>> Oops, of course I meant non-exclusive mode.
>
> +1.  I'll fix that in the next revision.

I finally got around to looking into this, and I think I found why it
was done this way in 2018.  backup_fs_hot() runs pg_start_backup(),
closes the session, copies the data, and then runs pg_stop_backup() in
a different session.  This doesn't work with non-exclusive mode
because the backup will be aborted when the session that runs
pg_start_backup() is closed.  pg_stop_backup() will fail with a
"backup is not in progress" error.  Furthermore,
010_logical_decoding_timelines.pl seems to be the only test that uses
backup_fs_hot().

After a quick glance, I didn't see an easy way to hold a session open
while the test does other things.  If there isn't one, modifying
backup_fs_hot() to work with non-exclusive mode might be more trouble
than it is worth.

Nathan



fix libpq comment

2022-01-06 Thread Euler Taveira
Hi,

While checking the PQping code I noticed that pg_ctl does not rely on PQping
since commit f13ea95f9e4 (v10) so the attached patch removes a comment from
internal_ping().


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 3055de48fe71f47df357114c7a42db05edcdb290 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Thu, 6 Jan 2022 20:50:30 -0300
Subject: [PATCH] pg_ctl does not rely on PQping anymore

Since commit f13ea95f9e4, PQping is not used by pg_ctl.
---
 src/interfaces/libpq/fe-connect.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 9b6a6939f0..dfebfd1096 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3919,8 +3919,7 @@ internal_ping(PGconn *conn)
 		return PQPING_NO_RESPONSE;
 
 	/*
-	 * Report PQPING_REJECT if server says it's not accepting connections. (We
-	 * distinguish this case mainly for the convenience of pg_ctl.)
+	 * Report PQPING_REJECT if server says it's not accepting connections.
 	 */
 	if (strcmp(conn->last_sqlstate, ERRCODE_CANNOT_CONNECT_NOW) == 0)
 		return PQPING_REJECT;
-- 
2.20.1



Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-06 Thread Peter Geoghegan
On Thu, Jan 6, 2022 at 2:45 PM Peter Geoghegan  wrote:
> But the "freeze early" heuristics work a bit like that anyway. We
> won't freeze all the tuples on a whole heap page early if we won't
> otherwise set the heap page to all-visible (not all-frozen) in the VM
> anyway.

I believe that applications tend to update rows according to
predictable patterns. Andy Pavlo made an observation about this at one
point:

https://youtu.be/AD1HW9mLlrg?t=3202

I think that we don't do a good enough job of keeping logically
related tuples (tuples inserted around the same time) together, on the
same original heap page, which motivated a lot of my experiments with
the FSM from last year. Even still, it seems like a good idea for us
to err in the direction of assuming that tuples on the same heap page
are logically related. The tuples should all be frozen together when
possible. And *not* frozen early when the heap page as a whole can't
be frozen (barring cases with one *much* older XID before
FreezeLimit).

-- 
Peter Geoghegan




Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

2022-01-06 Thread Todd Hubers
Hi Everyone,

I have started working on this:

   - Benchmarking - increasingly more comprehensive benchmarking
   - Prototyping - to simulate the change of users (toggling back and forth)
   - Draft Implementation - of OPTION-1 (New Protocol Message)
   - (Then: Working with Odyssey and PgBouncer to add support (when the
   GRANT role privilege is available))

I hope to have a patch ready by the end of March.

Regards,

Todd

On Wed, 24 Nov 2021 at 02:46, Todd Hubers  wrote:

>
> Hi Jacob and Daniel,
>
> Thanks for your feedback.
>
> >@Daniel - I think thats conflating session_user and current_user, SET
> ROLE is not a login event. This is by design and discussed in the
> documentation..
>
> Agreed, I am using those terms loosely. I have updated option 4 in the
> proposal document. I have crossed it out. Option 5 is more suitable "SET
> SESSION AUTHORIZATION" for further consideration.
>
> >@Daniel - but it's important to remember that we need to cover the
> functionality in terms of *tests* first, performance benchmarking is
> another concern.
>
> For implementation absolutely, but not for a basic feasibility prototype.
> A quick non-secure non-reliable prototype is probably an important
> first-step to confirming which options work best for the stated goals.
> Importantly, if the improvement is only 5% (whatever that might mean), then
> the project is probably not work starting. But I do expect that a benchmark
> will prove benefits that justify the resources to build the feature(s).
>
> >@Jacob - A more modern approach might be to attach the authentication to
> the packet itself (e.g. cryptographically, with a MAC), if the goal is to
> enable per-statement authentication anyway. In theory that turns the
> middleware into a message passer instead of a confusable deputy. But it
> requires more complicated setup between the client and server.
>
> I did consider this, but I ruled it out. I have now added it to the
> proposal document, and included two Issues. Please review and let me know
> whether I might be mistaken.
>
> >@Jacob - Having protocol-level tests for bytes on the wire would not only
> help proposals like this but also get coverage for a huge number of edge
> cases. Magnus has added src/test/protocol for the server, written in Perl,
> in his PROXY proposal. And I've added a protocol suite for both the client
> and server, written in Python/pytest, in my OAuth proof of concept. I think
> something is badly needed in this area.
>
> Thanks for highlighting this emerging work. I have noted this in the
> proposal in the Next Steps section.
>
> --Todd
>
> Note: Here is the proposal document link again -
> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit#
>
> On Tue, 23 Nov 2021 at 12:12, Jacob Champion  wrote:
>
>> On Sat, 2021-11-20 at 16:16 -0500, Tom Lane wrote:
>> > One more point is that the proposed business about
>> >
>> > * ImpersonateDatabaseUser will either succeed silently (0-RTT), or
>> >   fail. Upon failure, no further commands will be processed until
>> >   ImpersonateDatabaseUser succeeds.
>> >
>> > seems to require adding a huge amount of complication on the server
>> side,
>> > and complication in the protocol spec itself, to save a rather minimal
>> > amount of complication in the middleware.  Why can't we just say that
>> > a failed "impersonate" command leaves the session in the same state
>> > as before, and it's up to the pooler to do something about it?  We are
>> > in any case trusting the pooler not to send commands from user A to
>> > a session logged in as user B.
>>
>> When combined with the 0-RTT goal, I think a silent ignore would just
>> invite more security problems. Todd is effectively proposing packet
>> pipelining, so the pipeline has to fail shut.
>>
>> A more modern approach might be to attach the authentication to the
>> packet itself (e.g. cryptographically, with a MAC), if the goal is to
>> enable per-statement authentication anyway. In theory that turns the
>> middleware into a message passer instead of a confusable deputy. But it
>> requires more complicated setup between the client and server.
>>
>> > PS: I wonder how we test such a feature meaningfully without
>> > incorporating a pooler right into the Postgres tree.  I don't
>> > want to do that, for sure.
>>
>> Having protocol-level tests for bytes on the wire would not only help
>> proposals like this but also get coverage for a huge number of edge
>> cases. Magnus has added src/test/protocol for the server, written in
>> Perl, in his PROXY proposal. And I've added a protocol suite for both
>> the client and server, written in Python/pytest, in my OAuth proof of
>> concept. I think something is badly needed in this area.
>>
>> --Jacob
>>
>
>
> --
> --
> Todd Hubers
>


-- 
--
Todd Hubers


Re: Add jsonlog log_destination for JSON server logs

2022-01-06 Thread Andrew Dunstan


On 1/6/22 13:06, Andrew Dunstan wrote:
> On 1/5/22 02:32, Michael Paquier wrote:
>> On Sun, Jan 02, 2022 at 01:34:45PM -0800, Andres Freund wrote:
>>> The tests don't seem to pass on windows:
>>> https://cirrus-ci.com/task/5412456754315264?logs=test_bin#L47
>>> https://api.cirrus-ci.com/v1/artifact/task/5412456754315264/tap/src/bin/pg_ctl/tmp_check/log/regress_log_004_logrotate
>>>
>>> psql::1: ERROR:  division by zero
>>> could not open 
>>> "c:/cirrus/src/bin/pg_ctl/tmp_check/t_004_logrotate_primary_data/pgdata/current_logfiles":
>>>  The system cannot find the file specified at t/004_logrotate.pl line 87.
>> This seems to point out that the syslogger is too slow to capture the
>> logrotate signal, and the patch set is introducing nothing new in
>> terms of infrastructure, just an extra value for log_destination.
>> This stuff passes here, and I am not spotting something amiss after an
>> extra close read.
>>
>> Attached is an updated patch set that increases the test timeout (5min
>> -> 10min).  That should help, I assume.
>
> ITYM 3 min -> 6  min. But in any case, is that really going to solve
> this? The file should exist, even if its contents are not up to date, AIUI.



I have tested on an msys2 setup with your v8 patches and I am getting this:


#   Failed test 'current_logfiles is sane'
#   at t/004_logrotate.pl line 96.
#   'stderr log/postgresql-2022-01-06_222419.log
# csvlog log/postgresql-2022-01-06_222419.csv
# '
# doesn't match '(?^:^stderr log/postgresql-.*log
# csvlog log/postgresql-.*csv
# jsonlog log/postgresql-.*json$)'

#   Failed test 'found expected log file content for stderr'
#   at t/004_logrotate.pl line 103.
#   ''
# doesn't match '(?^:division by zero)'

#   Failed test 'found expected log file content for jsonlog'
#   at t/004_logrotate.pl line 105.
#   undef
# doesn't match '(?^:division by zero)'

#   Failed test 'pg_current_logfile() gives correct answer with jsonlog'
#   at t/004_logrotate.pl line 105.
#  got: ''
# expected: undef
# Looks like you failed 4 tests of 14.
[22:37:31] t/004_logrotate.pl ...
Dubious, test returned 4 (wstat 1024, 0x400)
Failed 4/14 subtests
[22:37:31]


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-06 Thread Peter Geoghegan
On Thu, Jan 6, 2022 at 12:54 PM Robert Haas  wrote:
> On Fri, Dec 17, 2021 at 9:30 PM Peter Geoghegan  wrote:
> > Can we fully get rid of vacuum_freeze_table_age? Maybe even get rid of
> > vacuum_freeze_min_age, too? Freezing tuples is a maintenance task for
> > physical blocks, but we use logical units (XIDs).
>
> I don't see how we can get rid of these. We know that catastrophe will
> ensue if we fail to freeze old XIDs for a sufficiently long time ---
> where sufficiently long has to do with the number of XIDs that have
> been subsequently consumed.

I don't really disagree with anything you've said, I think. There are
a few subtleties here. I'll try to tease them apart.

I agree that we cannot do without something like vacrel->FreezeLimit
for the foreseeable future -- but the closely related GUC
(vacuum_freeze_min_age) is another matter. Although everything you've
said in favor of the GUC seems true, the GUC is not a particularly
effective (or natural) way of constraining the problem. It just
doesn't make sense as a tunable.

One obvious reason for this is that the opportunistic freezing stuff
is expected to be the thing that usually forces freezing -- not
vacuum_freeze_min_age, nor FreezeLimit, nor any other XID-based
cutoff. As you more or less pointed out yourself, we still need
FreezeLimit as a backstop mechanism. But the value of FreezeLimit can
just come from autovacuum_freeze_max_age/2 in all cases (no separate
GUC), or something along those lines. We don't particularly expect the
value of FreezeLimit to matter, at least most of the time. It should
only noticeably affect our behavior during anti-wraparound VACUUMs,
which become rare with the patch (e.g. my pgbench_accounts example
upthread). Most individual tables will never get even one
anti-wraparound VACUUM -- it just doesn't ever come for most tables in
practice.

My big issue with vacuum_freeze_min_age is that it doesn't really work
with the freeze map work in 9.6, which creates problems that I'm
trying to address by freezing early and so on. After all, HEAD (and
all stable branches) can easily set a page to all-visible (but not
all-frozen) in the VM, meaning that the page's tuples won't be
considered for freezing until the next aggressive VACUUM. This means
that vacuum_freeze_min_age is already frequently ignored by the
implementation -- it's conditioned on other things that are practically
impossible to predict.

Curious about your thoughts on this existing issue with
vacuum_freeze_min_age. I am concerned about the "freezing cliff" that
it creates.

> So it's natural to decide whether or not
> we're going to wait for cleanup locks on pages on the basis of how old
> the XIDs they contain actually are.

I agree, but again, it's only a backstop. With the patch we'd have to
be rather unlucky to ever need to wait like this.

What are the chances that we keep failing to freeze an old XID from
one particular page, again and again? My testing indicates that it's a
negligible concern in practice (barring pathological cases with idle
cursors, etc).

> I think vacuum_freeze_min_age also serves a useful purpose: it
> prevents us from freezing data that's going to be modified again or
> even deleted in the near future. Since we can't know the future, we
> must base our decision on the assumption that the future will be like
> the past: if the page hasn't been modified for a while, then we should
> assume it's not likely to be modified again soon; otherwise not.

But the "freeze early" heuristics work a bit like that anyway. We
won't freeze all the tuples on a whole heap page early if we won't
otherwise set the heap page to all-visible (not all-frozen) in the VM
anyway.

> If we
> knew the time at which the page had last been modified, it would be
> very reasonable to use that here - say, freeze the XIDs if the page
> hasn't been touched in an hour, or whatever. But since we lack such
> timestamps the XID age is the closest proxy we have.

XID age is a *terrible* proxy. The age of an XID in a tuple header may
advance quickly, even when nobody modifies the same table at all.

I concede that it is true that we are (in some sense) "gambling" by
freezing early -- we may end up freezing a tuple that we subsequently
update anyway. But aren't we also "gambling" by *not* freezing early?
By not freezing, we risk getting into "freezing debt" that will have
to be paid off in one ruinously large installment. I would much rather
"gamble" on something where we can tolerate consistently "losing" than
gamble on something where I cannot ever afford to lose (even if it's
much less likely that I'll lose during any given VACUUM operation).

Besides all this, I think that we have a rather decent chance of
coming out ahead in practice by freezing early. In practice the
marginal cost of freezing early is consistently pretty low.
Cost-control-driven (as opposed to need-driven) freezing is *supposed*
to be cheaper, of course. And like it or not, freezing is really just part 

Re: row filtering for logical replication

2022-01-06 Thread Peter Smith
On Wed, Jan 5, 2022 at 1:05 PM wangw.f...@fujitsu.com
 wrote:
>
> On Thu, Jan 4, 2022 at 00:54 PM Peter Smith  wrote:
> > Modified in v58 [1] as suggested
> Thanks for updating the patches.
> A few comments about v58-0001 and v58-0002.
>
> v58-0001
> 1.
> How about modifying the following loop in copy_table by using for_each_from
> instead of foreach?
> Like the invocation of for_each_from in function get_rule_expr.
> from:
> if (qual != NIL)
> {
> ListCell   *lc;
> boolfirst = true;
>
> appendStringInfoString(, " WHERE ");
> foreach(lc, qual)
> {
> char   *q = strVal(lfirst(lc));
>
> if (first)
> first = false;
> else
> appendStringInfoString(, " OR ");
> appendStringInfoString(, q);
> }
> list_free_deep(qual);
> }
> change to:
> if (qual != NIL)
> {
> ListCell   *lc;
> char   *q = strVal(linitial(qual));
>
> appendStringInfo(, " WHERE %s", q);
> for_each_from(lc, qual, 1)
> {
> q = strVal(lfirst(lc));
> appendStringInfo(, " OR %s", q);
> }
> list_free_deep(qual);
> }
>

Modified as suggested in v59* [1]

> 2.
> I find the API of get_rel_sync_entry is modified.
> -get_rel_sync_entry(PGOutputData *data, Oid relid)
> +get_rel_sync_entry(PGOutputData *data, Relation relation)
> It looks like just moving the invocation of RelationGetRelid from outside into
> function get_rel_sync_entry. I am not sure whether this modification is
> necessary to this feature or not.
>

Fixed in v59* [1]. Removed the unnecessary changes.

> v58-0002
> 1.
> In function pgoutput_row_filter_init, if no_filter is set, I think we do not
> need to add row filter to list(rfnodes).
> So how about changing three conditions when add row filter to rfnodes like 
> this:
> -   if (pub->pubactions.pubinsert)
> +   if (pub->pubactions.pubinsert && 
> !no_filter[idx_ins])
> {
> rfnode = 
> stringToNode(TextDatumGetCString(rfdatum));
> rfnodes[idx_ins] = 
> lappend(rfnodes[idx_ins], rfnode);
> }
>

TODO.

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsiw9fbOUTpCMWirut1ZD5hbWk8_U9tZya4mG-YK%2Bfq8g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: row filtering for logical replication

2022-01-06 Thread Peter Smith
On Wed, Jan 5, 2022 at 5:01 PM vignesh C  wrote:
>
...
> 4) Should this be posted as a separate patch in a new thread, as it is
> not part of row filtering:
> --- a/src/include/parser/parse_node.h
> +++ b/src/include/parser/parse_node.h
> @@ -79,7 +79,7 @@ typedef enum ParseExprKind
> EXPR_KIND_CALL_ARGUMENT,/* procedure argument in CALL */
> EXPR_KIND_COPY_WHERE,   /* WHERE condition in COPY FROM */
> EXPR_KIND_GENERATED_COLUMN, /* generation expression for a column */
> -   EXPR_KIND_CYCLE_MARK,   /* cycle mark value */
> +   EXPR_KIND_CYCLE_MARK/* cycle mark value */
>  } ParseExprKind;
>

Fixed in v59* [1]

I started a new thread (with patch) for this one. See
https://www.postgresql.org/message-id/flat/CAHut%2BPsqr93nng7diTXxtUD636u7ytA%3DMq2duRphs0CBzpfDTA%40mail.gmail.com

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsiw9fbOUTpCMWirut1ZD5hbWk8_U9tZya4mG-YK%2Bfq8g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: row filtering for logical replication

2022-01-06 Thread Peter Smith
On Wed, Jan 5, 2022 at 4:34 PM Peter Smith  wrote:
>
> I have reviewed again the source code for v58-0001.
>
> Below are my review comments.
>
> Actually, I intend to fix most of these myself for v59*, so this post
> is just for records.
>
> v58-0001 Review Comments
> 
>
> 1. doc/src/sgml/ref/alter_publication.sgml - reword for consistency
>
> +  name to explicitly indicate that descendant tables are included. If the
> +  optional WHERE clause is specified, rows that do not
> +  satisfy the expression 
> will
> +  not be published. Note that parentheses are required around the
>
> For consistency, it would be better to reword this sentence about the
> expression to be more similar to the one in CREATE PUBLICATION, which
> now says:
>
> +  If the optional WHERE clause is specified, rows for
> +  which the expression 
> returns
> +  false or null will not be published. Note that parentheses are required
> +  around the expression. It has no effect on TRUNCATE
> +  commands.
>

Updated in v59* [1]

> ~~
>
> 2. doc/src/sgml/ref/create_subscription.sgml - reword for consistency
>
> @@ -319,6 +324,25 @@ CREATE SUBSCRIPTION  class="parameter">subscription_name the parameter create_slot = false.  This is an
> implementation restriction that might be lifted in a future release.
>
> +
> +  
> +   If any table in the publication has a WHERE clause, 
> rows
> +   that do not satisfy the  class="parameter">expression
> +   will not be published. If the subscription has several publications in 
> which
>
> For consistency, it would be better to reword this sentence about the
> expression to be more similar to the one in CREATE PUBLICATION, which
> now says:
>
> +  If the optional WHERE clause is specified, rows for
> +  which the expression 
> returns
> +  false or null will not be published. Note that parentheses are required
> +  around the expression. It has no effect on TRUNCATE
> +  commands.
>

Updated in v59* [1]

> ~~
>
> 3. src/backend/catalog/pg_publication.c - whitespace
>
> +rowfilter_walker(Node *node, Relation relation)
> +{
> + char*errdetail_msg = NULL;
> +
> + if (node == NULL)
> + return false;
> +
> +
> + if (IsRowFilterSimpleExpr(node))
>
> Remove the extra blank line.
>

Fixed in v59* [1]

> ~~
>
> 4. src/backend/executor/execReplication.c - move code
>
> + bad_rfcolnum = GetRelationPublicationInfo(rel, true);
> +
> + /*
> + * It is only safe to execute UPDATE/DELETE when all columns referenced in
> + * the row filters from publications which the relation is in are valid,
> + * which means all referenced columns are part of REPLICA IDENTITY, or the
> + * table do not publish UPDATES or DELETES.
> + */
> + if (AttributeNumberIsValid(bad_rfcolnum))
>
> I felt that the bad_rfcolnum assignment belongs below the large
> comment explaining this logic.
>

Fixed in v59* [1]

> ~~
>
> 5. src/backend/executor/execReplication.c - fix typo
>
> + /*
> + * It is only safe to execute UPDATE/DELETE when all columns referenced in
> + * the row filters from publications which the relation is in are valid,
> + * which means all referenced columns are part of REPLICA IDENTITY, or the
> + * table do not publish UPDATES or DELETES.
> + */
>
> Typo: "table do not publish" -> "table does not publish"
>

Fixed in v59* [1]

> ~~
>
> 6. src/backend/replication/pgoutput/pgoutput.c - fix typo
>
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> + /* Gather the rfnodes per pubaction of this publiaction. */
> + if (pub->pubactions.pubinsert)
>
> Typo: "publiaction" --> "publication"
>

Fixed in v59* [1]

> ~~
>
> 7. src/backend/utils/cache/relcache.c - fix comment case
>
> @@ -267,6 +271,19 @@ typedef struct opclasscacheent
>
>  static HTAB *OpClassCache = NULL;
>
> +/*
> + * Information used to validate the columns in the row filter expression. see
> + * rowfilter_column_walker for details.
> + */
>
> Typo: "see" --> "See"
>

Fixed in v59* [1]

> ~~
>
> 8. src/backend/utils/cache/relcache.c - "row-filter"
>
> For consistency with all other naming change all instances of
> "row-filter" to "row filter" in this file.
>

Fixed in v59* [1]

> ~~
>
> 9. src/backend/utils/cache/relcache.c - fix typo
>

Fixed in v59* [1]

> ~~
>
> 10. src/backend/utils/cache/relcache.c - comment confused wording?
>
> Function GetRelationPublicationInfo:
>
> + /*
> + * For a partition, if pubviaroot is true, check if any of the
> + * ancestors are published. If so, note down the topmost ancestor
> + * that is published via this publication, the row filter
> + * expression on which will be used to filter the partition's
> + * changes. We could have got the topmost ancestor when collecting
> + * the publication oids, but that will make the code more
> + * complicated.
> + */
>
> Typo: Probably "on which' --> "of which" ?
>

Fixed in v59* [1]

> ~~
>
> 11. src/backend/utils/cache/relcache.c - GetRelationPublicationActions
>
> Something seemed slightly 

Re: row filtering for logical replication

2022-01-06 Thread Peter Smith
On Wed, Jan 5, 2022 at 4:26 PM Amit Kapila  wrote:
>
> On Tue, Jan 4, 2022 at 12:15 PM Peter Smith  wrote:
> >
> > On Fri, Dec 31, 2021 at 12:39 AM houzj.f...@fujitsu.com
> >  wrote:
> > > > 3) v55-0002
> > > > +static bool pgoutput_row_filter_update_check(enum
> > > > ReorderBufferChangeType changetype, Relation relation,
> > > > +
> > > >HeapTuple oldtuple, HeapTuple newtuple,
> > > > +
> > > >RelationSyncEntry *entry, ReorderBufferChangeType *action);
> > > >
> > > > Do we need parameter changetype here? I think it could only be
> > > > REORDER_BUFFER_CHANGE_UPDATE.
> > >
> > > I didn't change this, I think it might be better to wait for Ajin's 
> > > opinion.
> >
> > I agree with Tang. AFAIK there is no problem removing that redundant
> > param as suggested. BTW - the Assert within that function is also
> > incorrect because the only possible value is
> > REORDER_BUFFER_CHANGE_UPDATE. I will make these fixes in a future
> > version.
> >
>
> That sounds fine to me too. One more thing is that you don't need to
> modify the action in case it remains update as the caller has already
> set that value. Currently, we are modifying it as update at two places
> in this function, we can remove both of those and keep the comments
> intact for the later update.
>

Fixed in v59* [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsiw9fbOUTpCMWirut1ZD5hbWk8_U9tZya4mG-YK%2Bfq8g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: row filtering for logical replication

2022-01-06 Thread Peter Smith
On Wed, Jan 5, 2022 at 4:56 PM Amit Kapila  wrote:
>
...

> > > 4.
> > > +#define IDX_PUBACTION_n 3
> > > + ExprState*exprstate[IDX_PUBACTION_n]; /* ExprState array for row 
> > > filter.
> > > +One per publication action. */
> > > ..
> > > ..
> > >
> > > I think we can have this define outside the structure. I don't like
> > > this define name, can we name it NUM_ROWFILTER_TYPES or something like
> > > that?
> > >
> >
> > Partly fixed in v51* [1], I've changed the #define name but I did not
> > move it. The adjacent comment talks about these ExprState caches and
> > explains the reason why the number is 3. So if I move the #define then
> > half that comment would have to move with it. I thought it is better
> > to keep all the related parts grouped together with the one
> > explanatory comment, but if you still want the #define moved please
> > confirm and I can do it in a future version.
> >
>
> Yeah, I would prefer it to be moved. You can move the part of the
> comment suggesting three pubactions can be used for row filtering.
>

Fixed in v59* [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsiw9fbOUTpCMWirut1ZD5hbWk8_U9tZya4mG-YK%2Bfq8g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

2022-01-06 Thread Bossart, Nathan
On 12/21/21, 11:42 AM, "Mark Dilger"  wrote:
> +   /* pre-v9.3 lock-only bit pattern */
> +   ereport(ERROR,
> +   (errcode(ERRCODE_DATA_CORRUPTED),
> +errmsg_internal("found tuple with HEAP_XMAX_COMMITTED 
> and"
> +"HEAP_XMAX_EXCL_LOCK set and "
> +"HEAP_XMAX_IS_MULTI unset")));
> +   }
> +
>
> I find this bit hard to understand.  Does the comment mean to suggest that 
> the *upgrade* process should have eliminated all pre-v9.3 bit patterns, and 
> therefore any such existing patterns are certainly corruption, or does it 
> mean that data written by pre-v9.3 servers (and not subsequently updated) is 
> defined as corrupt, or  ?
>
> I am not complaining that the logic is wrong, just trying to wrap my head 
> around what the comment means.

This is just another way that a tuple may be marked locked-only, and
we want to explicitly disallow locked-only + xmax-committed.  This bit
pattern may be present on servers that were pg_upgraded from pre-v9.3
versions.  See commits 0ac5ad5 and 74ebba8 for more detail.

Nathan



Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Joel Jacobson
On Thu, Jan 6, 2022, at 21:38, Pavel Stehule wrote:
> I say, semantically - how I understand the meaning of the word "in" is not 
> good to use it for generic alias of function arguments (when we have out 
> arguments too).

Trying to imagine a situation where you would need a shorthand also for OUT 
parameters in real-life PL/pgSQL function.
Can you give an example?

I can only think of situations where it is needed for IN parameters.

/Joel

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-06 Thread Robert Haas
On Fri, Dec 17, 2021 at 9:30 PM Peter Geoghegan  wrote:
> Can we fully get rid of vacuum_freeze_table_age? Maybe even get rid of
> vacuum_freeze_min_age, too? Freezing tuples is a maintenance task for
> physical blocks, but we use logical units (XIDs).

I don't see how we can get rid of these. We know that catastrophe will
ensue if we fail to freeze old XIDs for a sufficiently long time ---
where sufficiently long has to do with the number of XIDs that have
been subsequently consumed. So it's natural to decide whether or not
we're going to wait for cleanup locks on pages on the basis of how old
the XIDs they contain actually are. Admittedly, that decision doesn't
need to be made at the start of the vacuum, as we do today. We could
happily skip waiting for a cleanup lock on pages that contain only
newer XIDs, but if there is a page that both contains an old XID and
stays pinned for a long time, we eventually have to sit there and wait
for that pin to be released. And the best way to decide when to switch
to that strategy is really based on the age of that XID, at least as I
see it, because it is the age of that XID reaching 2 billion that is
going to kill us.

I think vacuum_freeze_min_age also serves a useful purpose: it
prevents us from freezing data that's going to be modified again or
even deleted in the near future. Since we can't know the future, we
must base our decision on the assumption that the future will be like
the past: if the page hasn't been modified for a while, then we should
assume it's not likely to be modified again soon; otherwise not. If we
knew the time at which the page had last been modified, it would be
very reasonable to use that here - say, freeze the XIDs if the page
hasn't been touched in an hour, or whatever. But since we lack such
timestamps the XID age is the closest proxy we have.

> The
> risk mostly comes from how much total work we still need to do to
> advance relfrozenxid. If the single old XID is quite old indeed (~1.5
> billion XIDs), but there is only one, then we just have to freeze one
> tuple to be able to safely advance relfrozenxid (maybe advance it by a
> huge amount!). How long can it take to freeze one tuple, with the
> freeze map, etc?

I don't really see any reason for optimism here. There could be a lot
of unfrozen pages in the relation, and we'd have to troll through all
of those in order to find that single old XID. Moreover, there is
nothing whatsoever to focus autovacuum's attention on that single old
XID rather than anything else. Nothing in the autovacuum algorithm
will cause it to focus its efforts on that single old XID at a time
when there's no pin on the page, or at a time when that XID becomes
the thing that's holding back vacuuming throughout the cluster. A lot
of vacuum problems that users experience today would be avoided if
autovacuum had perfect knowledge of what it ought to be prioritizing
at any given time, or even some knowledge. But it doesn't, and is
often busy fiddling while Rome burns.

IOW, the time that it takes to freeze that one tuple *in theory* might
be small. But in practice it may be very large, because we won't
necessarily get around to it on any meaningful time frame.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: psql: \dl+ to list large objects privileges

2022-01-06 Thread Pavel Luzanov

On 06.01.2022 21:13, Tom Lane wrote:

I made the same changes to two files in the 'expected' directory:
largeobject.out and largeobject_1.out.
Although I must say that I still can't understand why more than one file
with expected result is used for some tests.

Because the output sometimes varies across platforms.  IIRC, the
case where largeobject_1.out is needed is for Windows, where the
file that gets inserted into one of the LOs might contain CR/LF
not just LF newlines, so the LO contents look different.


So simple. Thanks for the explanation.


Pushed with some minor editorialization.


Thanks!

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company





Re: Make relfile tombstone files conditional on WAL level

2022-01-06 Thread Andres Freund
On 2022-01-06 08:52:01 -0500, Robert Haas wrote:
> On Thu, Jan 6, 2022 at 3:47 AM Thomas Munro  wrote:
> > Another problem is that relfilenodes are normally allocated with
> > GetNewOidWithIndex(), and initially match a relation's OID.  We'd need
> > a new allocator, and they won't be able to match the OID in general
> > (while we have 32 bit OIDs at least).
> 
> Personally I'm not sad about that. Values that are the same in simple
> cases but diverge in more complex cases are kind of a trap for the
> unwary.

+1




Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-06 Thread Peter Geoghegan
On Tue, Jan 4, 2022 at 9:40 PM Fujii Masao  wrote:
> Could you tell me what happens if new tuple with XID larger than xid_base + 
> 0x is inserted into the page? Such new tuple is not allowed to be 
> inserted into that page?

I fear that this patch will have many bugs along these lines. Example:
Why is it okay that convert_page() may have to defragment a heap page,
without holding a cleanup lock? That will subtly break code that holds
a pin on the buffer, when a tuple slot contains a C pointer to a
HeapTuple in shared memory (though only if we get unlucky).

Currently we are very permissive about what XID/backend can (or
cannot) consume a specific piece of free space from a specific heap
page in code like RelationGetBufferForTuple(). It would be very hard
to enforce a policy like "your XID cannot insert onto this particular
heap page" with the current FSM design. (I actually think that we
should have a FSM that supports these requirements, but that's a big
project -- a long term goal of mine [1].)

> Or xid_base and xids of all existing tuples in the page are increased? Also 
> what happens if one of those xids (of existing tuples) cannot be changed 
> because the tuple still can be seen by very-long-running transaction?

That doesn't work in the general case, I think. How could it, unless
we truly had 64-bit XIDs in heap tuple headers? You can't necessarily
freeze to fix the problem, because we can only freeze XIDs that 1.)
committed, and 2.) are visible to every possible MVCC snapshot. (I
think you were alluding to this problem yourself.)

I believe that a good solution to the problem that this patch tries to
solve needs to be more ambitious. I think that we need to return to
first principles, rather than extending what we have already.
Currently, we store XIDs in tuple headers so that we can determine the
tuple's visibility status, based on whether the XID committed (or
aborted), and where our snapshot sees the XID as "in the past" (in the
case of an inserted tuple's xmin). You could say that XIDs from tuple
headers exist so we can see differences *between* tuples.  But these
differences are typically not useful/interesting for very long. 32-bit
XIDs are sometimes not wide enough, but usually they're "too wide":
Why should we need to consider an old XID (e.g. do clog lookups) at
all, barring extreme cases?

Why do we need to keep any kind of metadata about transactions around
for a long time? Postgres has not supported time travel in 25 years!

If we eagerly cleaned-up aborted transactions with a special kind of
VACUUM (which would remove aborted XIDs), we could also maintain a
structure that indicates if all of the XIDs on a heap page are known
"all committed" implicitly (no dirtying the page, no hint bits, etc)
-- something a little like the visibility map, that is mostly set
implicitly (not during VACUUM). That doesn't fix the wraparound
problem itself, of course. But it enables a design that imposes the
same problem on the specific old snapshot instead -- something like a
"snapshot too old" error is much better than a system-wide wraparound
failure. That approach is definitely very hard, and also requires a
smart FSM along the lines described in [1], but it seems like the best
way forward.

As I pointed out already, freezing is bad because it imposes the
requirement that everybody considers an affected XID committed and
visible, which is brittle (e.g., old snapshots can cause wraparound
failure). More generally, we rely too much on explicitly maintaining
"absolute" metadata inline, when we should implicitly maintain
"relative" metadata (that can be discarded quickly and without concern
for old snapshots). We need to be more disciplined about what XIDs can
modify what heap pages in the first place (in code like hio.c and the
FSM) to make all this work.

[1] 
https://www.postgresql.org/message-id/CAH2-Wz%3DzEV4y_wxh-A_EvKxeAoCMdquYMHABEh_kZO1rk3a-gw%40mail.gmail.com
--
Peter Geoghegan




Re: Add index scan progress to pg_stat_progress_vacuum

2022-01-06 Thread Bossart, Nathan
On 12/29/21, 8:44 AM, "Imseih (AWS), Sami"  wrote:
> In "pg_stat_progress_vacuum", introduce 2 columns:
>
> * total_index_vacuum : This is the # of indexes that will be vacuumed. Keep 
> in mind that if failsafe mode kicks in mid-flight to the vacuum, Postgres may 
> choose to forgo index scans. This value will be adjusted accordingly.
> * max_index_vacuum_cycle_time : The total elapsed time for a index vacuum 
> cycle is calculated and this value will be updated to reflect the longest 
> vacuum cycle. Until the first cycle completes, this value will be 0. The 
> purpose of this column is to give the user an idea of how long an index 
> vacuum cycle takes to complete.

I think that total_index_vacuum is a good thing to have.  I would
expect this to usually just be the number of indexes on the table, but
as you pointed out, this can be different when we are skipping
indexes.  My only concern with this new column is the potential for
confusion when compared with the index_vacuum_count value.
index_vacuum_count indicates the number of vacuum cycles completed,
but total_index_vacuum indicates the number of indexes that will be
vacuumed.  However, the names sound like they could refer to the same
thing to me.  Perhaps we should rename index_vacuum_count to
index_vacuum_cycles/index_vacuum_cycle_count, and the new column
should be something like num_indexes_to_vacuum or index_vacuum_total.

I don't think we need the max_index_vacuum_cycle_time column.  While
the idea is to give users a rough estimate for how long an index cycle
will take, I don't think it will help generate any meaningful
estimates for how much longer the vacuum operation will take.  IIUC we
won't have any idea how many total index vacuum cycles will be needed.
Even if we did, the current cycle could take much more or much less
time.  Also, none of the other progress views seem to provide any
timing information, which I suspect is by design to avoid inaccurate
estimates.

> Introduce a new view called "pg_stat_progress_vacuum_index". This view will 
> track the progress of a worker ( or leader PID ) while it's vacuuming an 
> index. It will expose some key columns:
>
> * pid: The PID of the worker process
>
> * leader_pid: The PID of the leader process. This is the column that can be 
> joined with "pg_stat_progress_vacuum". leader_pid and pid can have the same 
> value as a leader can also perform an index vacuum.
>
> * indrelid: The relid of the index currently being vacuumed
>
> * vacuum_cycle_ordinal_position: The processing position of the index being 
> vacuumed. This can be useful to determine how many indexes out of the total 
> indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed
>
> * index_tuples_vacuumed: This is the number of index tuples vacuumed for the 
> index overall. This is useful to show that the vacuum is actually doing work, 
> as the # of tuples keeps increasing. 

Should we also provide some information for determining the progress
of the current cycle?  Perhaps there should be an
index_tuples_vacuumed_current_cycle column that users can compare with
the num_dead_tuples value in pg_stat_progress_vacuum.  However,
perhaps the number of tuples vacuumed in the current cycle can already
be discovered via index_tuples_vacuumed % max_dead_tuples.

+void
+rusage_adjust(const PGRUsage *ru0, PGRUsage *ru1)
+{
+   if (ru1->tv.tv_usec < ru0->tv.tv_usec)
+   {
+   ru1->tv.tv_sec--;
+   ru1->tv.tv_usec += 100;
+   }
+   if (ru1->ru.ru_stime.tv_usec < ru0->ru.ru_stime.tv_usec)
+   {
+   ru1->ru.ru_stime.tv_sec--;
+   ru1->ru.ru_stime.tv_usec += 100;
+   }
+   if (ru1->ru.ru_utime.tv_usec < ru0->ru.ru_utime.tv_usec)
+   {
+   ru1->ru.ru_utime.tv_sec--;
+   ru1->ru.ru_utime.tv_usec += 100;
+   }
+}

I think this function could benefit from a comment.  Without going
through it line by line, it is not clear to me exactly what it is
doing.

I know we're still working on what exactly this stuff should look
like, but I would suggest adding the documentation changes in the near
future.

Nathan



Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Pavel Stehule
čt 6. 1. 2022 v 21:04 odesílatel Joel Jacobson  napsal:

> On Thu, Jan 6, 2022, at 20:24, Pavel Stehule wrote:
> > But there is nothing similar in standard.
> > Standard doesn't specify any column or table or label names in the
> custom area.
>
> I think that's an unfair comparison.
> This isn't at all the same thing as dictating column or table names.
> I merely suggest reusing an existing reserved keyword.
>
> >>čt 6. 1. 2022 v 20:03 odesílatel Joel Jacobson 
> napsal:
> >>
> >>If "in." would work, due to "in" being a reserved SQL keyword,
> >>don't you think the benefits of a SQL standardized solution would
> outweigh our
> >>personal preferences on what word each one of us prefer?
> >
> >I know that "in" is a reserved word in SQL, but I have not any knowledge
> of it being used as alias in SQL functions or in >SQL/PSM functions.
>
> Are you concerned "in" might already be used as an alias somehow?
>

I did not fully correct sentence. I wanted to write "alias of arguments of
SQL functions or SQL/PSM functions". I am sorry.


> I did some testing:
>
> "in" can be used as a column alias:
>
> => SELECT 123 AS in;
> in
> -
> 123
> (1 row)
>
> But it cannot be used as a table alias, which is what matters:
>
> => WITH in AS (SELECT 1) SELECT * FROM in;
> ERROR:  syntax error at or near "in"
>
> => SELECT * FROM t in;
> ERROR:  syntax error at or near "in"
>
> => SELECT * FROM t AS in;
> ERROR:  syntax error at or near "in"
>
> It's also currently not possible to use it as a PL/pgSQL alias:
>
> => CREATE FUNCTION f(id int)
> RETURNS void
> LANGUAGE plpgsql AS $$
> DECLARE
> in ALIAS FOR $1;
> BEGIN
> END
> $$;
> ERROR:  syntax error at or near "in"
> LINE 5: in ALIAS FOR $1;
>
>
I didn't say, so "IN" cannot be used. Technically, maybe (I didn't write a
patch). I say, semantically - how I understand the meaning of the word "in"
is not good to use it for generic alias of function arguments (when we have
out arguments too).

Pavel




> /Joel
>
>


Re: Bugs in pgoutput.c

2022-01-06 Thread Robert Haas
On Thu, Jan 6, 2022 at 2:58 PM Tom Lane  wrote:
> That might be okay for the system catalog entries, but I don't see
> how it prevents some other session from dropping the table entirely,
> thereby causing the on-disk storage to go away.  Is it guaranteed
> that logical decoding will never try to fetch any on-disk data?
> (I can sort of believe that that might be true, but there are scary
> corner cases for toasted data, such as an UPDATE that carries forward
> a pre-existing toast datum.)

If I'm not mistaken, logical decoding is only allowed to read data
from system catalog tables or tables that are flagged as being "like
system catalog tables for logical decoding purposes only". See
RelationIsAccessibleInLogicalDecoding and
RelationIsUsedAsCatalogTable. As far as the actual table data is
concerned, it has to be reconstructed solely from the WAL.

I am not sure what locking is required here and am not taking a
position on that ... but it's definitely not the case that a logical
decoding plugin can decide to just read from any old table it likes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Joel Jacobson
On Thu, Jan 6, 2022, at 20:24, Pavel Stehule wrote:
> But there is nothing similar in standard.
> Standard doesn't specify any column or table or label names in the custom 
> area. 

I think that's an unfair comparison.
This isn't at all the same thing as dictating column or table names.
I merely suggest reusing an existing reserved keyword.

>>čt 6. 1. 2022 v 20:03 odesílatel Joel Jacobson  napsal:
>>
>>If "in." would work, due to "in" being a reserved SQL keyword,
>>don't you think the benefits of a SQL standardized solution would outweigh our
>>personal preferences on what word each one of us prefer?
>
>I know that "in" is a reserved word in SQL, but I have not any knowledge of it 
>being used as alias in SQL functions or in >SQL/PSM functions. 

Are you concerned "in" might already be used as an alias somehow?

I did some testing:

"in" can be used as a column alias:

=> SELECT 123 AS in;
in
-
123
(1 row)

But it cannot be used as a table alias, which is what matters:

=> WITH in AS (SELECT 1) SELECT * FROM in;
ERROR:  syntax error at or near "in"

=> SELECT * FROM t in;
ERROR:  syntax error at or near "in"

=> SELECT * FROM t AS in;
ERROR:  syntax error at or near "in"

It's also currently not possible to use it as a PL/pgSQL alias:

=> CREATE FUNCTION f(id int)
RETURNS void
LANGUAGE plpgsql AS $$
DECLARE
in ALIAS FOR $1;
BEGIN
END
$$;
ERROR:  syntax error at or near "in"
LINE 5: in ALIAS FOR $1;

/Joel


Re: Bugs in pgoutput.c

2022-01-06 Thread Tom Lane
Amit Kapila  writes:
> On Thu, Jan 6, 2022 at 3:42 AM Tom Lane  wrote:
>> Also ... maybe I'm not looking in the right place, but I do not
>> see anything anywhere in logical decoding that is taking any lock
>> on the relation being processed.  How can that be safe?

> We don't need to acquire a lock on relation while decoding changes
> from WAL because it uses a historic snapshot to build a relcache entry
> and all the later changes to the rel are absorbed while decoding WAL.

That might be okay for the system catalog entries, but I don't see
how it prevents some other session from dropping the table entirely,
thereby causing the on-disk storage to go away.  Is it guaranteed
that logical decoding will never try to fetch any on-disk data?
(I can sort of believe that that might be true, but there are scary
corner cases for toasted data, such as an UPDATE that carries forward
a pre-existing toast datum.)

> * Would it be better if we move all the initialization done by patch
> in get_rel_sync_entry() to a separate function as I expect future
> patches might need to reset more things?

Don't see that it helps particularly.

> + * *during* a callback if we do any syscache or table access in the
> + * callback.

> As we don't take locks on tables, can invalidation events be accepted
> during table access? I could be missing something but I see relation.c
> accepts invalidation messages only when lock mode is not 'NoLock'.

The core point here is that you're assuming that NO code path taken
during logical decoding would try to take a lock.  I don't believe it,
at least not unless you can point me to some debugging cross-check that
guarantees it.

Given that we're interested in historic not current snapshots, I can
buy that it might be workable to manage syscache invalidations totally
differently than the way it's done in normal processing, in which case
(*if* it's done like that) maybe no invals would need to be recognized
while an output plugin is executing.  But (a) the comment here is
entirely wrong if that's so, and (b) I don't see anything in inval.c
that makes it work differently.

regards, tom lane




Re: Collecting statistics about contents of JSONB columns

2022-01-06 Thread Tomas Vondra




On 1/5/22 21:22, Simon Riggs wrote:

On Fri, 31 Dec 2021 at 22:07, Tomas Vondra
 wrote:


The patch does something far more
elegant - it simply uses stavalues to store an array of JSONB documents,
each describing stats for one path extracted from the sampled documents.


Sounds good.


I'm sure there's plenty open questions - for example I think we'll need
some logic to decide which paths to keep, otherwise the statistics can
get quite big, if we're dealing with large / variable documents. We're
already doing something similar for MCV lists.

One of Nikita's patches not included in this thread allow "selective"
statistics, where you can define in advance a "filter" restricting which
parts are considered interesting by ANALYZE. That's interesting, but I
think we need some simple MCV-like heuristics first anyway.

Another open question is how deep the stats should be. Imagine documents
like this:

{"a" : {"b" : {"c" : {"d" : ...

The current patch build stats for all possible paths:

   "a"
   "a.b"
   "a.b.c"
   "a.b.c.d"

and of course many of the paths will often have JSONB documents as
values, not simple scalar values. I wonder if we should limit the depth
somehow, and maybe build stats only for scalar values.


The user interface for designing filters sounds hard, so I'd hope we
can ignore that for now.



Not sure I understand. I wasn't suggesting any user-defined filtering, 
but something done by default, similarly to what we do for regular MCV 
lists, based on frequency. We'd include frequent paths while excluding 
rare ones.


So no need for a user interface.

That might not work for documents with stable schema and a lot of 
top-level paths, because all the top-level paths have 1.0 frequency. But 
for documents with dynamic schema (different documents having different 
schemas/paths) it might help.


Similarly for the non-scalar values - I don't think we can really keep 
regular statistics on such values (for the same reason why it's not 
enough for whole JSONB columns), so why to build/store that anyway.



Nikita did implement a way to specify custom filters using jsonpath, but 
I did not include that into this patch series. And questions regarding 
the interface were one of the reasons.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Pavel Stehule
čt 6. 1. 2022 v 20:03 odesílatel Joel Jacobson  napsal:

> On Thu, Jan 6, 2022, at 19:03, Pavel Stehule wrote:
> > The possibility to define a label dynamically is a better solution (not
> by some buildin keyword),
> > because it allows some possibility for the end user to define what he
> prefers.
>
> I'm trying to understand why you think a user-defined notation is
> desirable,
> why it wouldn't be better if the SQL standard would endorse a notation,
> so we could all write code in the same way, avoiding ugly GUCs or PRAGMAs
> altogether?
>

But there is nothing similar in standard. Standard doesn't specify any
column or table or label names in the custom area.

The ADA language, an PL/SQL origin, and the PL/pgSQL origin has not nothing
similar too. Moreover it (ADA language) was designed as a safe, very
verbose language without implicit conventions.

I think we have different positions, because we see different usage, based
on, probably, a different programming style. I can understand the request
for special common notation for access for routine parameters. But the "in"
keyword for this case is not good, and I really think it is better to give
some freedom to the user to choose their own label, if we don't know the
best one.



> If "in." would work, due to "in" being a reserved SQL keyword,
> don't you think the benefits of a SQL standardized solution would outweigh
> our
> personal preferences on what word each one of us prefer?
>

I know that "in" is a reserved word in SQL, but I have not any knowledge of
it being used as alias in SQL functions or in SQL/PSM functions.



> /Joel
>


Re: Collecting statistics about contents of JSONB columns

2022-01-06 Thread Tomas Vondra

On 1/1/22 22:16, Zhihong Yu wrote:

Hi,

+static JsonPathStats
+jsonStatsFindPathStats(JsonStats jsdata, char *path, int pathlen)

Stats appears twice in the method name. I think findJsonPathStats() 
should suffice.
It should check `if (jsdata->nullfrac >= 1.0)` 
as jsonStatsGetPathStatsStr does.


+JsonPathStats
+jsonStatsGetPathStatsStr(JsonStats jsdata, const char *subpath, int 
subpathlen)


This func can be static, right ?
I think findJsonPathStatsWithPrefix() would be a better name for the func.

+ * XXX Doesn't this need ecape_json too?
+ */
+static void
+jsonPathAppendEntryWithLen(StringInfo path, const char *entry, int len)
+{
+   char *tmpentry = pnstrdup(entry, len);
+   jsonPathAppendEntry(path, tmpentry);

ecape_json() is called within jsonPathAppendEntry(). The XXX comment can 
be dropped.


+jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats, int index)

It seems getJsonSelectivityWithArrayIndex() would be a better name.



Thanks. I'll think about the naming changes.


+       sel = scalarineqsel(NULL, operator,
+                           operator == JsonbGtOperator ||
+                           operator == JsonbGeOperator,
+                           operator == JsonbLeOperator ||
+                           operator == JsonbGeOperator,

Looking at the comment for scalarineqsel():

  *  scalarineqsel       - Selectivity of "<", "<=", ">", ">=" for scalars.
  *
  * This is the guts of scalarltsel/scalarlesel/scalargtsel/scalargesel.
  * The isgt and iseq flags distinguish which of the four cases apply.

It seems JsonbLtOperator doesn't appear in the call, can I ask why ?



Because the scalarineqsel signature is this

scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
  Oid collation,
  VariableStatData *vardata, Datum constval,
  Oid consttype)

so

/* is it greater or greater-or-equal */
isgt = operator == JsonbGtOperator ||
   operator == JsonbGeOperator

/* is it equality? */
iseq = operator == JsonbLeOperator ||
   operator == JsonbGeOperator,

So I think this is correct. A comment explaining this would be nice.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Index-only scan for btree_gist turns bpchar to char

2022-01-06 Thread Tom Lane
Japin Li  writes:
> On Thu, 06 Jan 2022 at 00:34, Tom Lane  wrote:
>> The minimum-effort fix would be to apply rtrim1 to both strings
>> in gbt_bpchar_consistent, but I wonder if we can improve on that
>> by pushing the ignore-trailing-spaces behavior further down.
>> I didn't look yet at whether gbt_var_consistent can support
>> any type-specific behavior.

> Adding type-specific for gbt_var_consistent looks like more generally.
> For example, for bpchar type, we should call bpchareq rather than texteq.

I looked at this and it does seem like it might work, as per attached
patch.  The one thing that is troubling me is that the opclass is set
up to apply gbt_text_same, which is formally the Wrong Thing for bpchar,
because the equality semantics shouldn't be quite the same.  But we
could not fix that without a module version bump, which is annoying.
I think that it might not be necessary to change it, because

(1) There's no such thing as unique GIST indexes, so it should not
matter if the "same" function is a bit stricter than the datatype's
nominal notion of equality.  It's certainly okay for that to vary
from the semantics applied by the consistent function --- GIST has
no idea that the consistent function is allegedly testing equality.

(2) If all the input values for a column have been coerced to the same
typmod, then it doesn't matter because two values that are equal after
space-stripping would be equal without space-stripping, too.

However, (2) doesn't hold for an existing index that the user has failed
to REINDEX, because then the index would contain some space-stripped
values that same() will not say are equal to incoming new values.
Again, I think this doesn't matter much, but maybe I'm missing
something.  I've not really dug into what GIST uses the same()
function for.

In any case, if we do need same() to implement the identical
behavior to bpchareq(), then the other solution isn't sufficient
either.

So in short, it seems like we ought to do some compatibility testing
and see if this code misbehaves at all with an index created by the
old code.  I don't particularly want to do that ... any volunteers?

regards, tom lane

diff --git a/contrib/btree_gist/btree_text.c b/contrib/btree_gist/btree_text.c
index 8019d11281..be0eac7975 100644
--- a/contrib/btree_gist/btree_text.c
+++ b/contrib/btree_gist/btree_text.c
@@ -90,6 +90,76 @@ static gbtree_vinfo tinfo =
 	NULL
 };
 
+/* bpchar needs its own comparison rules */
+
+static bool
+gbt_bpchargt(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
+{
+	return DatumGetBool(DirectFunctionCall2Coll(bpchargt,
+collation,
+PointerGetDatum(a),
+PointerGetDatum(b)));
+}
+
+static bool
+gbt_bpcharge(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
+{
+	return DatumGetBool(DirectFunctionCall2Coll(bpcharge,
+collation,
+PointerGetDatum(a),
+PointerGetDatum(b)));
+}
+
+static bool
+gbt_bpchareq(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
+{
+	return DatumGetBool(DirectFunctionCall2Coll(bpchareq,
+collation,
+PointerGetDatum(a),
+PointerGetDatum(b)));
+}
+
+static bool
+gbt_bpcharle(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
+{
+	return DatumGetBool(DirectFunctionCall2Coll(bpcharle,
+collation,
+PointerGetDatum(a),
+PointerGetDatum(b)));
+}
+
+static bool
+gbt_bpcharlt(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
+{
+	return DatumGetBool(DirectFunctionCall2Coll(bpcharlt,
+collation,
+PointerGetDatum(a),
+PointerGetDatum(b)));
+}
+
+static int32
+gbt_bpcharcmp(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
+{
+	return DatumGetInt32(DirectFunctionCall2Coll(bpcharcmp,
+ collation,
+ PointerGetDatum(a),
+ PointerGetDatum(b)));
+}
+
+static gbtree_vinfo bptinfo =
+{
+	gbt_t_bpchar,
+	0,
+	false,
+	gbt_bpchargt,
+	gbt_bpcharge,
+	gbt_bpchareq,
+	gbt_bpcharle,
+	gbt_bpcharlt,
+	gbt_bpcharcmp,
+	NULL
+};
+
 
 /**
  * Text ops
@@ -112,29 +182,8 @@ gbt_text_compress(PG_FUNCTION_ARGS)
 Datum
 gbt_bpchar_compress(PG_FUNCTION_ARGS)
 {
-	GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
-	GISTENTRY  *retval;
-
-	if (tinfo.eml == 0)
-	{
-		tinfo.eml = pg_database_encoding_max_length();
-	}
-
-	if (entry->leafkey)
-	{
-
-		Datum		d = DirectFunctionCall1(rtrim1, entry->key);
-		GISTENTRY	trim;
-
-		gistentryinit(trim, d,
-	  entry->rel, entry->page,
-	  entry->offset, true);
-		retval = gbt_var_compress(, );
-	}
-	else
-		retval = entry;
-
-	PG_RETURN_POINTER(retval);
+	/* This should never have been distinct from gbt_text_compress */
+	return gbt_text_compress(fcinfo);
 }
 
 
@@ -179,18 +228,17 @@ gbt_bpchar_consistent(PG_FUNCTION_ARGS)
 	bool		retval;
 	GBT_VARKEY *key = (GBT_VARKEY 

Re: Collecting statistics about contents of JSONB columns

2022-01-06 Thread Tomas Vondra

On 1/1/22 16:33, Zhihong Yu wrote:

Hi,
For patch 1:

+   List       *statisticsName = NIL;   /* optional stats estimat. 
procedure */


I think if the variable is named estimatorName (or something similar), 
it would be easier for people to grasp its purpose.




I agree "statisticsName" might be too generic or confusing, but I'm not 
sure "estimator" is an improvement. Because this is not an "estimator" 
(in the sense of estimating selectivity). It "transforms" statistics to 
match the expression.



+       /* XXX perhaps full "statistics" wording would be better */
+       else if (strcmp(defel->defname, "stats") == 0)

I would recommend (stats sounds too general):

+       else if (strcmp(defel->defname, "statsestimator") == 0)

+       statisticsOid = ValidateStatisticsEstimator(statisticsName);

statisticsOid -> statsEstimatorOid



Same issue with the "estimator" bit :-(


For get_oprstat():

+   }
+   else
+       return (RegProcedure) InvalidOid;

keyword else is not needed (considering the return statement in if block).



True, but this is how the other get_ functions in lsyscache.c do it. 
Like get_oprjoin().



For patch 06:

+   /* FIXME Could be before the memset, I guess? Checking 
vardata->statsTuple. */

+   if (!data->statsTuple)
+       return false;

I would agree the check can be lifted above the memset call.



OK.

+ * XXX This does not really extract any stats, it merely allocates the 
struct?

+ */
+static JsonPathStats
+jsonPathStatsGetSpecialStats(JsonPathStats pstats, JsonPathStatsType type)

As comments says, I think allocJsonPathStats() would be better name for 
the func.


+ * XXX Why doesn't this do jsonPathStatsGetTypeFreq check similar to what
+ * jsonPathStatsGetLengthStats does?

I think `jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0` check 
should be added for jsonPathStatsGetArrayLengthStats().


To be continued ...


OK. I'll see if Nikita has some ideas about the naming changes.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Joel Jacobson
On Thu, Jan 6, 2022, at 19:03, Pavel Stehule wrote:
> The possibility to define a label dynamically is a better solution (not by 
> some buildin keyword),
> because it allows some possibility for the end user to define what he prefers.

I'm trying to understand why you think a user-defined notation is desirable,
why it wouldn't be better if the SQL standard would endorse a notation,
so we could all write code in the same way, avoiding ugly GUCs or PRAGMAs 
altogether?

If "in." would work, due to "in" being a reserved SQL keyword,
don't you think the benefits of a SQL standardized solution would outweigh our
personal preferences on what word each one of us prefer?

/Joel

Storage for multiple variable-length attributes in a single row

2022-01-06 Thread Esteban Zimanyi
Dear all

When ingesting mobility (IoT) data into MobilityDB
https://mobilitydb.com/
we transform very wide (2K attributes) car mobility data of high frequence
(every tenth of a second) from flat format (e.g. CSV) into MobilityDB
format in which there is a single record per trip and each of the signals
is transformed into a temporal attribute (tbool, tint, tfloat, ttext,
tgeompoint, tgeogpoint), which are temporal extensions of the corresponding
PostgreSQL/PostGIS base types (bool, int, float, text, geometry,
geography). All temporal types are stored using extended format, e.g.,
CREATE TYPE tfloat (
  internallength = variable,
  [...]
  storage = extended,
  alignment = double,
  [...]
);

Given that each temporal value can be very wide (on average 30K
timestamped  points/floats/text/... per trip) our first question is
* Is extended the right storage for this ?

Our second question is how all the 2K temporal attributes are stored, which
may be
* on a single table space
* in one table space per attribute
which in other words, relates to the question row vs column storage.

Many thanks for your insight

Esteban


Re: SQL:2011 application time

2022-01-06 Thread Corey Huinker
>
>
>  But
> the standard says that dropping system versioning should automatically
> drop all historical records (2 under Part 2: Foundation, 11.30  system versioning clause>). That actually makes sense though: when you
> do DML we automatically update the start/end columns, but we don't
> save copies of the previous data (and incidentally the end column will
> always be the max value.)


This is what I was referring to when I mentioned a side-table.
deleting history would be an O(1) operation. Any other
misunderstandings are all mine.


Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Pavel Stehule
>
>
>
> I accept that both issues are valid, and I don't think we can find a good
> design that solves both issues, because there are different objective
> expectations.
>

I accept that both issues are valid, and I don't think we can find a
**one** good design that solves both issues, because there are different
objective expectations.


>
> Regards
>
> Pavel
>
>
>
>> /Joel
>>
>>
>>
>>
>>


Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Pavel Stehule
čt 6. 1. 2022 v 18:18 odesílatel Joel Jacobson  napsal:

> On Thu, Jan 6, 2022, at 17:55, Tom Lane wrote:
> > Even if it works today, we could be letting ourselves in for future
> > trouble.  The SQL standard is a moving target, and they could easily
> > standardize some future syntax involving IN that creates a problem here.
>
> Perhaps the "in." notation could be standardized by the SQL standard,
> allowing vendors to use such notation without fear of future trouble?
>
> > I think we already have two perfectly satisfactory answers:
> > * qualify parameters with the function name to disambiguate them;
> > * use the ALIAS feature to create an internal, shorter name.
>
> I would say we have two OK workarounds, far from perfect:
> * Qualifying parameters is too verbose. Function names can be long.
> * Having to remap parameters using ALIAS is cumbersome.
>
> This problem is one of my top annoyances with PL/pgSQL.
>

It depends on the programming style. I am accepting so this can be a real
issue, although I have had this necessity only a few times, and I have not
received any feedback from customers about it.

Probably there can be an agreement so somebody uses it more often and
somebody only a few times. If you need it every time, then you need some
global solution. If you use it only a few times, then a local verbose
solution will be prefered, and a global solution can be the source of new
issues.

Maybe they can help if we accept that there are two different problems, and
we should design two different solutions.

1. how to set globally plpgsql root namespace
2. how to set locally plpgsql root namespace

@1 can be solved by (very dirty) GUC plpggsql.root_ns_alias (this is just
first shot, nothing more)
@2 can be solved by #option root_ns_alias or by extending DECLARE by syntax
"fx ALIAS FOR LABEL functioname

I accept that both issues are valid, and I don't think we can find a good
design that solves both issues, because there are different objective
expectations.

Regards

Pavel



> /Joel
>
>
>
>
>


Re: Deduplicate min restart_lsn calculation code

2022-01-06 Thread Alvaro Herrera
On 2022-Jan-06, Bharath Rupireddy wrote:

> Hi,
> 
> It seems like the two functions ReplicationSlotsComputeRequiredLSN and
> ReplicationSlotsComputeLogicalRestartLSN more or less does the same
> thing which makes me optimize (saving 40 LOC) it as attached. I'm
> pretty much okay if it gets rejected on the grounds that it creates a
> lot of diff with the older versions and the new API may not look
> nicer, still I want to give it a try.
> 
> Thoughts?

Hmm, it seems sensible to me.  But I would not have the second boolean
argument in the new function, and instead have the caller save the
return value in a local variable to do the XLogSetReplicationSlotMinimumLSN
step separately.  Then the new function API is not so strange.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)




Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output

2022-01-06 Thread Peter Geoghegan
On Tue, Dec 21, 2021 at 11:57 PM Masahiko Sawada  wrote:
> I've looked at the patch and here are comments:

Thanks!

The patch bitrot again, so attached is a rebased version, v3.

> I think we should set message_level. Otherwise, index AM will set an
> invalid log level, although any index AM in core seems not to use it.

Fixed.

> ---
> -   /*
> -* Update error traceback information.  This is the
> last phase during
> -* which we add context information to errors, so we
> don't need to
> -* revert to the previous phase.
> -*/
>
> Why is this comment removed? ISTM this comment is still valid.

We don't "revert to the previous phase" here, which is always
VACUUM_ERRCB_PHASE_SCAN_HEAP in practice, per the comment -- but that
doesn't seem significant. It's not just unnecessary to do so, as the
comment claims -- it actually seems *wrong*. That is, it would be
wrong to go back to VACUUM_ERRCB_PHASE_SCAN_HEAP here, since we're
completely finished scanning the heap at this point.

There is still perhaps a small danger that somebody will forget to add
a new VACUUM_ERRCB_PHASE_* for some new kind of work that happens
after this point, at the very last moment. But that would be equally
true if the new kind of work took place earlier, inside
lazy_scan_heap(). And so the last call to update_vacuum_error_info()
isn't special compared to any other update_vacuum_error_info() call
(or any other call that doesn't set a saved_err_info).

--
Peter Geoghegan


v3-0001-Unify-VACUUM-VERBOSE-and-log_autovacuum_min_durat.patch
Description: Binary data


Re: psql: \dl+ to list large objects privileges

2022-01-06 Thread Tom Lane
Pavel Luzanov  writes:
>> I wonder if we shouldn't put these new tests in largeobject.sql instead.
>> (That would mean there are two expected-files to change, which is a bit
>> tedious, but it's not very hard.)

> I made the same changes to two files in the 'expected' directory: 
> largeobject.out and largeobject_1.out.
> Although I must say that I still can't understand why more than one file 
> with expected result is used for some tests.

Because the output sometimes varies across platforms.  IIRC, the
case where largeobject_1.out is needed is for Windows, where the
file that gets inserted into one of the LOs might contain CR/LF
not just LF newlines, so the LO contents look different.

> Also, I decided to delete following line in the listLargeObjects 
> function because all the other commands in describe.c do not contain it:
>      myopt.topt.tuples_only = false;

Agreed, I'd done that already in my version of the patch.

> Second version (after splitting) is attached.

Pushed with some minor editorialization.

regards, tom lane




Re: SQL:2011 application time

2022-01-06 Thread Paul A Jungwirth
On Thu, Jan 6, 2022 at 6:45 AM Vik Fearing  wrote:
>
> On 1/5/22 11:03 PM, Corey Huinker wrote:
> >
> > There was similar work being done for system periods, which are a bit
> > simpler but require a side (history) table to be created.
>
> This is false.  SYSTEM_TIME periods do not need any kind of history.
> This was one of the problems I had with Surafel's attempt because it was
> confusing the period with SYSTEM VERSIONING.  Versioning needs the
> period but the inverse is not true.

This is an interesting point. Syntactically, there are three different
things: the generated started/end columns, the period declaration, and
the WITH SYSTEM VERSIONING modifier to the table. You could declare a
system period without making the table versioned. Practically speaking
I don't know why you'd ever create a system period without a versioned
table (do you know of any uses Vik?), but perhaps we can exploit the
separation to add system periods in the same patch that adds
application periods.

The first two bits of syntax *are* tied together: you need columns
with GENERATED ALWAYS AS ROW START/END to declare the system period,
and less intuitively the standard says you can't use AS ROW START/END
unless those columns appear in a system period (2.e.v.2 under Part 2:
Foundation, 11.3 ). Personally I'd be willing to
ignore that latter requirement. For one thing, what does Postgres do
with the columns if you drop the period? Dropping the columns
altogether seems very harsh, so I guess you'd just remove the
GENERATED clause.

Another weird thing is that you don't (can't) say STORED for those
columns. But they are certainly stored somewhere. I would store the
values just like any other column (even if non-current rows get moved
to a separate table). Also then you don't have to do anything extra
when the GENERATED clause is dropped.

If we wanted to support system-time periods without building all of
system versioning, what would that look like? At first I thought it
would be a trivial addition to part-1 of the patch here, but the more
I think about it the more it seems to deserve its own patch.

One rule I think we should follow is that using a non-system-versioned
table (with a system period) should get you to the same place as using
a system-versioned table and then removing the system versioning. But
the standard says that dropping system versioning should automatically
drop all historical records (2 under Part 2: Foundation, 11.30 ). That actually makes sense though: when you
do DML we automatically update the start/end columns, but we don't
save copies of the previous data (and incidentally the end column will
always be the max value.) So there is a use case, albeit a thin one:
you get a Rails-like updated_at column that is maintained
automatically by your RDBMS. That is pretty easy, but I think I'd
still break it out into a separate patch. I'm happy to work on that as
something that builds on top of my part-1 patch here.

Yours,
Paul




Re: Add jsonlog log_destination for JSON server logs

2022-01-06 Thread Andrew Dunstan


On 1/5/22 02:32, Michael Paquier wrote:
> On Sun, Jan 02, 2022 at 01:34:45PM -0800, Andres Freund wrote:
>> The tests don't seem to pass on windows:
>> https://cirrus-ci.com/task/5412456754315264?logs=test_bin#L47
>> https://api.cirrus-ci.com/v1/artifact/task/5412456754315264/tap/src/bin/pg_ctl/tmp_check/log/regress_log_004_logrotate
>>
>> psql::1: ERROR:  division by zero
>> could not open 
>> "c:/cirrus/src/bin/pg_ctl/tmp_check/t_004_logrotate_primary_data/pgdata/current_logfiles":
>>  The system cannot find the file specified at t/004_logrotate.pl line 87.
> This seems to point out that the syslogger is too slow to capture the
> logrotate signal, and the patch set is introducing nothing new in
> terms of infrastructure, just an extra value for log_destination.
> This stuff passes here, and I am not spotting something amiss after an
> extra close read.
>
> Attached is an updated patch set that increases the test timeout (5min
> -> 10min).  That should help, I assume.


ITYM 3 min -> 6  min. But in any case, is that really going to solve
this? The file should exist, even if its contents are not up to date, AIUI.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Pavel Stehule
čt 6. 1. 2022 v 17:48 odesílatel Joel Jacobson  napsal:

> On Thu, Jan 6, 2022, at 17:10, Pavel Stehule wrote:
> > I understand well, and I don't think it's nice.
> >
> > Are there some similar features in other programming languages?
>
> It would be similar to "this" in Javascript/Java/C++,
> but instead using "in" to access function parameters.
>
> Currently, we need to prefix the parameter name if it's in conflict with a
> column name:
>
> CREATE FUNCTION very_long_function_name(id int, some_value text)
> RETURNS boolean
> LANGUAGE sql AS $$
> UPDATE some_table
> SET some_value = very_long_function_name.some_value
> WHERE id = very_long_function_name.id RETURNING TRUE
> $$;
>
> This is cumbersome as function names can be long, and if changing the
> function name,
> you would need to update all occurrences of the function name in the code.
>
> If we could instead refer to the parameters by prefixing them with "in.",
> we could write:
>
> CREATE FUNCTION very_long_function_name(id int, some_value text)
> RETURNS boolean
> LANGUAGE sql AS $$
> UPDATE some_table
> SET some_value = in.some_value
> WHERE id = in.id RETURNING TRUE
> $$;
>
> I think this would be nice, even if it would only work for IN parameters,
> since you seldom need to access OUT parameters in the problematic
> WHERE-clauses anyway.
> I mostly use OUT parameters when setting them on a separate row:
> some_out_var := some_value;
> ...or, when SELECTin INTO an OUT parameter, which wouldn't be a problem.
>

There is full agreement in a description of the issue. Just I don't like
the proposed solution. The word "in '' is not too practical (where there
are out, and inout) - and it goes against the philosophy of ADA, where all
labels are parametrized (there is not any buildin label). The ADA language
has two things that plpgsql has not (unfortunately): a) possibility to
modify (configure) compiler by PRAGMA directive, b) possibility to define
PRAGMA on more levels - package, function, block. The possibility to define
a label dynamically is a better solution (not by some buildin keyword),
because it allows some possibility for the end user to define what he
prefers. For some cases "in" can be ok, but when you have only two out
variables, then "in" looks a little bit strange, and I prefer "fx", other
people can prefer  "a" or "args".

There is no technical problem in the definition of alias of top namespace.
The problem is in syntax and in forcing this setting to some set of
routines. Theoretically we can have GUC plpgsql.rootns. I can set there
"fx", and you can set there "in" and we both can be happy. But the real
question is - how to force this setting to all functions. GUC can be
overwritten in session, and although you can set this GUC in every
function (by option or by assigned GUC), it is not nice, and somebody can
forget about this setting. For me, there are more interesting (important)
issues than the possibility to specify the root namespace that can be nice
to control. I miss some configuration mechanism independent of GUC that is
static (and that emulates static compile options), that can be assigned to
database (as synonym for project) or schema (as synonym for module or
package). With this mechanism this thread will be significantly shorter,
and all discussion about plpgsql2 was not.


>
> > you can check it. It is true, so IN is usually followed by "(", but
> until check I am not able to say if there will be an unwanted
> > shift or collision or not.
>
> I checked gram.y, and IN_P is never followed by '.', but not sure if it
> could cause problems anyway, hope someone with parser knowledge can comment
> on this.
>
> /Joel
>


Re: Suggestion: optionally return default value instead of error on failed cast

2022-01-06 Thread Corey Huinker
On Thu, Jan 6, 2022 at 12:18 PM Andrew Dunstan  wrote:

>
> On 1/4/22 22:17, Corey Huinker wrote:
> >
> > currently a failed cast throws an error. It would be useful to have a
> > way to get a default value instead.
> >
> >
> > I've recently encountered situations where this would have been
> > helpful. Recently I came across some client code:
> >
> > CREATE OR REPLACE FUNCTION is_valid_json(str text) RETURNS boolean
> > LANGUAGE PLPGSQL
> > AS $$
> > DECLARE
> > j json;
> > BEGIN
> > j := str::json;
> > return true;
> > EXCEPTION WHEN OTHERS THEN return false;
> > END
> > $$;
> >
> >
> > This is a double-bummer. First, the function discards the value so we
> > have to recompute it, and secondly, the exception block prevents the
> > query from being parallelized.
>
>
> This particular case is catered for in the SQL/JSON patches which
> several people are currently reviewing:
>
>
That's great to know, but it would still be parsing the json twice, once to
learn that it is legit json, and once to get the casted value.

Also, I had a similar issue with type numeric, so having generic "x is a
type_y" support would essentially do everything that a try_catch()-ish
construct would need to do, and be more generic.


Re: Emit "checkpoint skipped because system is idle" message at LOG level if log_checkpoints is set

2022-01-06 Thread Justin Pryzby
On Wed, Jan 05, 2022 at 10:45:06AM +0530, Dilip Kumar wrote:
> +1 to convert to LOG when log_checkpoints is set.

On Thu, Jan 06, 2022 at 04:34:38PM +0900, Kyotaro Horiguchi wrote:
> Agreed. -1 to just raising elevel of the message.

On Thu, Jan 06, 2022 at 06:58:14PM +0800, Julien Rouhaud wrote:
> -1 too.
> 
> > If someone keen to show some debug messages, it is viable for
> > arbitrary messages by lowering log_min_messages then inserting a
> > custom filter to emit_log_hook.  It invites some overhead on
> > irrelevant processes, but that overhead would be avoidable with a
> > *bit* dirty hack in the filter,
> > 
> > We might want to discuss more convenient or cleaner way to get the
> > same result.
> 
> We could add a checkpoint_skipped counter to pg_stat_bgwriter for instance.

+1 (cc Melanie)

Bharath: there's no agreement that this behavior change is desirable, so I
suggest to close the CF entry.

Actually, I suggest to not immediately create CF entries; instead, wait to see
if there's any objections or discussion.  FWIW, I try to wait a day before
creating a CF entry, since the scope/goal/desirability of a thread can change
dramatically.  This avoids burdening reviewers with the task of later
discussing whether it's okay to close a CF entry.

-- 
Justin




Re: Suggestion: optionally return default value instead of error on failed cast

2022-01-06 Thread Andrew Dunstan


On 1/4/22 22:17, Corey Huinker wrote:
>
> currently a failed cast throws an error. It would be useful to have a
> way to get a default value instead.
>
>
> I've recently encountered situations where this would have been
> helpful. Recently I came across some client code:
>
> CREATE OR REPLACE FUNCTION is_valid_json(str text) RETURNS boolean
> LANGUAGE PLPGSQL
> AS $$
> DECLARE
>     j json;
> BEGIN
>     j := str::json;
>     return true;
> EXCEPTION WHEN OTHERS THEN return false;
> END
> $$;
>
>
> This is a double-bummer. First, the function discards the value so we
> have to recompute it, and secondly, the exception block prevents the
> query from being parallelized.


This particular case is catered for in the SQL/JSON patches which
several people are currently reviewing:


andrew=# select 'foo' is json;
 ?column?
--
 f
(1 row)

andrew=# select '"foo"' is json;
 ?column?
--
 t
(1 row)


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Joel Jacobson
On Thu, Jan 6, 2022, at 17:55, Tom Lane wrote:
> Even if it works today, we could be letting ourselves in for future
> trouble.  The SQL standard is a moving target, and they could easily
> standardize some future syntax involving IN that creates a problem here.

Perhaps the "in." notation could be standardized by the SQL standard,
allowing vendors to use such notation without fear of future trouble?

> I think we already have two perfectly satisfactory answers:
> * qualify parameters with the function name to disambiguate them;
> * use the ALIAS feature to create an internal, shorter name.

I would say we have two OK workarounds, far from perfect:
* Qualifying parameters is too verbose. Function names can be long.
* Having to remap parameters using ALIAS is cumbersome.

This problem is one of my top annoyances with PL/pgSQL.

/Joel





Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)

2022-01-06 Thread Justin Pryzby
There's no leak after running for ~5 weeks.

$ ps -O lstart,vsize,rss 17930
  PID  STARTEDVSZ   RSS S TTY  TIME COMMAND
17930 Tue Nov 30 15:35:26 2021 1019464 117424 S ?  7-04:54:03 postgres: 
telsasoft ts 192.168.122.13(57640) idle

Unless you suggest otherwise , I'm planning to restart the DB soon and go back
to running the pgdg rpm binaries with jit=off rather than what I compiled and
patched locally.

On Thu, Nov 18, 2021 at 03:20:39PM -0600, Justin Pryzby wrote:
> On Wed, Nov 10, 2021 at 09:56:44AM -0600, Justin Pryzby wrote:
> > Thread starting here:
> > https://www.postgresql.org/message-id/20201001021609.GC8476%40telsasoft.com
> > 
> > On Fri, Dec 18, 2020 at 05:56:07PM -0600, Justin Pryzby wrote:
> > > I'm 99% sure the "bad_alloc" is from LLVM.  It happened multiple times on
> > > different servers (running a similar report) after setting jit=on during 
> > > pg13
> > > upgrade, and never happened since re-setting jit=off.
> > 
> > Since this recurred a few times recently (now running pg14.0), and I finally
> > managed to get a non-truncated corefile...
> 
> I think the reason this recurred is that, since upgrading to pg14, I no longer
> had your memleak patches applied.  I'd forgotten about it, but was probably
> running a locally compiled postgres with your patches applied.
> 
> I should've mentioned that this crash was associated with the message from the
> original problem report:
> 
> |terminate called after throwing an instance of 'std::bad_alloc'
> |  what():  std::bad_alloc
> 
> The leak discussed on other threads seems fixed by your patches - I compiled
> v14 and now running with no visible leaks since last week.
> https://www.postgresql.org/message-id/flat/20210417021602.7dilihkdc7obl...@alap3.anarazel.de
> 
> As I understand it, there's still an issue with an allocation failure causing
> SIGABRT rather than FATAL.
> 
> It took me several tries to get the corefile since the process is huge, caused
> by the leak (and abrtd wanted to truncate it, nullifying its utility).
> 
> -rw---. 1 postgres postgres 8.4G Nov 10 08:57 
> /var/lib/pgsql/14/data/core.31345
> 
> I installed more debug packages to get a fuller stacktrace.
> 
> #0  0x7f2497880337 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x7f2497881a28 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2  0x7f2487cbf265 in __gnu_cxx::__verbose_terminate_handler() () from 
> /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #3  0x7f2487c66696 in __cxxabiv1::__terminate(void (*)()) () from 
> /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #4  0x7f2487c666c3 in std::terminate() () from 
> /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #5  0x7f2487c687d3 in __cxa_throw () from 
> /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #6  0x7f2487c686cd in operator new(unsigned long) () from 
> /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #7  0x7f2486477b9c in allocateBuckets (this=0x2ff7f38, this=0x2ff7f38, 
> Num=) at 
> /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:753
> No locals.
> #8  llvm::DenseMap std::default_delete >, llvm::DenseMapAPIntKeyInfo, 
> llvm::detail::DenseMapPair std::default_delete > > >::grow 
> (this=this@entry=0x2ff7f38, AtLeast=)
> at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:691
> OldNumBuckets = 33554432
> OldBuckets = 0x7f23f3e42010
> #9  0x7f2486477f29 in grow (AtLeast=, this=0x2ff7f38) at 
> /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:461
> No locals.
> #10 InsertIntoBucketImpl (TheBucket=, Lookup=..., 
> Key=..., this=0x2ff7f38) at 
> /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:510
> NewNumEntries = 
> EmptyKey = 
> #11 InsertIntoBucket (Key=..., TheBucket=, 
> this=0x2ff7f38) at 
> /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:471
> No locals.
> #12 FindAndConstruct (Key=..., this=0x2ff7f38) at 
> /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:271
> TheBucket = 
> #13 operator[] (Key=..., this=0x2ff7f38) at 
> /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:275
> No locals.
> #14 llvm::ConstantInt::get (Context=..., V=...) at 
> /usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:550
> pImpl = 0x2ff7eb0
> #15 0x7f2486478263 in llvm::ConstantInt::get (Ty=0x2ff85a8, V= out>, isSigned=isSigned@entry=false) at 
> /usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:571
> No locals.
> #16 0x7f248648673d in LLVMConstInt (IntTy=, N= out>, SignExtend=SignExtend@entry=0) at 
> /usr/src/debug/llvm-5.0.1.src/lib/IR/Core.cpp:952
> No locals.
> #17 0x7f2488f66c18 in l_ptr_const (type=0x3000650, ptr=) 
> at ../../../../src/include/jit/llvmjit_emit.h:29
> c = 
> #18 llvm_compile_expr (state=) at llvmjit_expr.c:246
> op 

Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Tom Lane
Pavel Stehule  writes:
> čt 6. 1. 2022 v 16:59 odesílatel Joel Jacobson  napsal:
>> How could the SQL parser have a problem with it, if "in" is currently
>> never followed by "." (dot)?

> you can check it. It is true, so IN is usually followed by "(", but until
> check I am not able to say if there will be an unwanted shift or collision
> or not.

Even if it works today, we could be letting ourselves in for future
trouble.  The SQL standard is a moving target, and they could easily
standardize some future syntax involving IN that creates a problem here.

I think we already have two perfectly satisfactory answers:
* qualify parameters with the function name to disambiguate them;
* use the ALIAS feature to create an internal, shorter name.
I see no problem here that is worth locking ourselves into unnecessary
syntax assumptions to fix.

regards, tom lane




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Joel Jacobson
On Thu, Jan 6, 2022, at 17:10, Pavel Stehule wrote:
> I understand well, and I don't think it's nice. 
>
> Are there some similar features in other programming languages? 

It would be similar to "this" in Javascript/Java/C++,
but instead using "in" to access function parameters.

Currently, we need to prefix the parameter name if it's in conflict with a 
column name:

CREATE FUNCTION very_long_function_name(id int, some_value text)
RETURNS boolean
LANGUAGE sql AS $$
UPDATE some_table
SET some_value = very_long_function_name.some_value
WHERE id = very_long_function_name.id RETURNING TRUE
$$;

This is cumbersome as function names can be long, and if changing the function 
name,
you would need to update all occurrences of the function name in the code.

If we could instead refer to the parameters by prefixing them with "in.", we 
could write:

CREATE FUNCTION very_long_function_name(id int, some_value text)
RETURNS boolean
LANGUAGE sql AS $$
UPDATE some_table
SET some_value = in.some_value
WHERE id = in.id RETURNING TRUE
$$;

I think this would be nice, even if it would only work for IN parameters,
since you seldom need to access OUT parameters in the problematic WHERE-clauses 
anyway.
I mostly use OUT parameters when setting them on a separate row:
some_out_var := some_value;
...or, when SELECTin INTO an OUT parameter, which wouldn't be a problem.

> you can check it. It is true, so IN is usually followed by "(", but until 
> check I am not able to say if there will be an unwanted
> shift or collision or not.

I checked gram.y, and IN_P is never followed by '.', but not sure if it could 
cause problems anyway, hope someone with parser knowledge can comment on this.

/Joel

Deduplicate min restart_lsn calculation code

2022-01-06 Thread Bharath Rupireddy
Hi,

It seems like the two functions ReplicationSlotsComputeRequiredLSN and
ReplicationSlotsComputeLogicalRestartLSN more or less does the same
thing which makes me optimize (saving 40 LOC) it as attached. I'm
pretty much okay if it gets rejected on the grounds that it creates a
lot of diff with the older versions and the new API may not look
nicer, still I want to give it a try.

Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-deduplicate-min-restart_lsn-calculation-code.patch
Description: Binary data


Re: SQL/JSON: functions

2022-01-06 Thread Andrew Dunstan


On 1/6/22 06:24, Himanshu Upadhyaya wrote:
> I have one general question on the below scenario.
> CREATE TABLE T (Id INTEGER PRIMARY KEY,Jcol CHARACTER VARYING ( 5000
> )CHECK ( Jcol IS JSON ) );
> insert into T values (1,323);
>  ORACLE is giving an error(check constraint...violated ORA-06512) for
> the above insert but Postgres is allowing it, however is not related
> to this patch but just thinking if this is expected.
>
> ‘postgres[22198]=#’SELECT * FROM T WHERE Jcol IS JSON;
>  id | jcol
> +--
>   1 | 323
> How come number 323 is the valid json?


If you look at the JSON grammar at 
or

it's clear that a bare number is valid json. Our parser implements that
grammar pretty faithfully, in fact rather more faithfully than many
implementations (e.g. we allow huge number strings). So as far as I'm
concerned, we are right and Oracle is wrong. It would hardly be the
first time such a thing has happened.

Oracle is not the definer of the JSON standard. ECMA is.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-06 Thread Finnerty, Jim
Re:Most software has a one-stage upgrade model. What you propose would
have us install 2 things, with a step in-between, which makes it
harder to manage.

The intended benefit would be that the code doesn't need to handle the 
possibility of 2 different XID representations for the indefinite future.  

I agree that VACUUM would be the preferred tool to make room for the special 
data area so that there is no need to install a separate tool, though, whether 
this work happens before or after the upgrade. 

Re: 1. Upgrade, with important aspect not-enabled-yet, but everything else 
working - all required software is delivered in one shot, with fast upgrade

Let's clarify what happens during upgrade.  What format are the pages in 
immediately after the upgrade? 

2. As each table is VACUUMed, we confirm/clean/groom data blocks so
each table is individually confirmed as being ready. The pace that
this happens at is under user control.

What are VACUUM's new responsibilities in this phase?  VACUUM needs a new task 
that confirms when there exists no heap page for a table that is not ready.

If upgrade put all the pages into either double-xmax or double-epoch 
representation, then VACUUM's responsibility could be to split the double-xmax 
pages into the double-epoch representation and verify when there exists no 
double-xmax pages.

3. When all tables have been prepared, then restart to allow xid64 format 
usage

Let's also clarify what happens at restart time.

If we were to do the upgrade before preparing space in advance, is there a way 
to ever remove the code that knows about the double-xmax XID representation?





Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Pavel Stehule
čt 6. 1. 2022 v 16:59 odesílatel Joel Jacobson  napsal:

> On Thu, Jan 6, 2022, at 15:05, Pavel Stehule wrote:
> >>čt 6. 1. 2022 v 14:28 odesílatel Joel Jacobson 
> napsal:
> >>How about using the existing reserved keyword "in" followed by "." (dot)
> and then the function parameter name?
> >>
> >>This idea is based on the assumption "in." would always be a syntax
> error everywhere in both SQL and PL/pgSQL,
> >>so if we would introduce such a syntax, no existing code could be
> affected, except currently invalid code.
> >>
> >>I wouldn't mind using "in." to refer to IN/OUT/INOUT parameters and not
> only IN ones, it's a minor confusion that could be >>explained in the docs.
> >
> >You are right, in.outvar looks messy.
>
> I think you misunderstood what I meant, I suggested "in.outvar" would
> actually be OK.
>

I understand well, and I don't think it's nice.

Are there some similar features in other programming languages?


> > Moreover, maybe the SQL parser can have a problem with it.
>
> How could the SQL parser have a problem with it, if "in" is currently
> never followed by "." (dot)?
> Not an expert in the SQL parser, trying to understand why it would be a
> problem.
>

you can check it. It is true, so IN is usually followed by "(", but until
check I am not able to say if there will be an unwanted shift or collision
or not.

Regards

Pavel





>
> /Joel
>


Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Joel Jacobson
On Thu, Jan 6, 2022, at 15:05, Pavel Stehule wrote:
>>čt 6. 1. 2022 v 14:28 odesílatel Joel Jacobson  napsal:
>>How about using the existing reserved keyword "in" followed by "." (dot) and 
>>then the function parameter name?
>>
>>This idea is based on the assumption "in." would always be a syntax error 
>>everywhere in both SQL and PL/pgSQL,
>>so if we would introduce such a syntax, no existing code could be affected, 
>>except currently invalid code.
>>
>>I wouldn't mind using "in." to refer to IN/OUT/INOUT parameters and not only 
>>IN ones, it's a minor confusion that could be >>explained in the docs.
>
>You are right, in.outvar looks messy.

I think you misunderstood what I meant, I suggested "in.outvar" would actually 
be OK.

> Moreover, maybe the SQL parser can have a problem with it. 

How could the SQL parser have a problem with it, if "in" is currently never 
followed by "." (dot)?
Not an expert in the SQL parser, trying to understand why it would be a problem.

/Joel

Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

2022-01-06 Thread Jelte Fennema
Attached are 3 patches that address the feedback from Andres about code 
duplication 
and splitting up commits. I completely removed internal_cancel now, since it 
only had 
one caller at this point.

> IMO, this is a new feature not a bug fix

IMO this is definitely a bugfix. Nowhere in the libpq docs it stated that the 
connection 
options in question do not apply to connections that are opened for 
cancellations. So 
as a user I definitely expect that any connections that libpq opens would use 
these options.
Which is why I definitely consider it a bug that they are currently not 
honoured for cancel 
requests. 

However, even though I think it's a bugfix, I can understand the being hesitant 
to 
backport this. IMHO in that case at least the docs should be updated to explain 
this 
discrepancy. I attached a patch to do so against the docs on the REL_14_STABLE 
branch.

> To me it seems a bit problematic to introduce a divergence between windows /
> everything else here. Isn't that just going to lead to other complaints just
> like this thread, where somebody discovered the hard way that there's platform
> dependent behaviour here?

Of course, fixing this also for windows would be much better. There's two 
problems:
1. I cannot find any clear documentation on which functions are signal safe in 
Windows 
   and which are not. The only reference I can find is this: 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170
 
   However, this says that you should not use any function that generates a 
system call.  
   PQcancel is obviously already violating that when calling "connect", so this 
is not very helpful.
2. My Windows C experience is non existent, so I don't think I would be the 
right person to write this code.

IMO blocking this bugfix, because it does not fix it for Windows, would be an 
example of perfect 
becoming the enemy of good. One thing I could do is add a note to the docs that 
these options 
are not supported on Windows for cancellation requests (similar to my proposed 
doc change 
for PG14 and below).

0001-Refactor-to-remove-internal_cancel-helper.patch
Description: 0001-Refactor-to-remove-internal_cancel-helper.patch


0002-Use-timeout-socket-options-for-cancel-connections.patch
Description:  0002-Use-timeout-socket-options-for-cancel-connections.patch


0003-Honor-connect_timeout-when-connecting-with-PQcancel.patch
Description:  0003-Honor-connect_timeout-when-connecting-with-PQcancel.patch


0001-Doc-explain-that-connection-options-don-t-apply-to-c.patch
Description:  0001-Doc-explain-that-connection-options-don-t-apply-to-c.patch


Re: sqlsmith: ERROR: XX000: bogus varno: 2

2022-01-06 Thread Tom Lane
Amit Langote  writes:
> Thanks for addressing that in the patch you posted.  I guess fixing
> only expression_tree_walker/mutator() suffices for now, but curious to
> know if it was intentional that you decided not to touch the following
> sites:

> exprCollation(): it would perhaps make sense to return the collation
> assigned to the 1st element of listdatums/lowerdatums/upperdatums,
> especially given that transformPartitionBoundValue() does assign a
> collation to the values in those lists based on the parent's partition
> key specification.

But each column could have a different collation, no?  I do not
think it's sensible to pick one of those at random and claim
that's the collation of the whole thing.  So throwing an error
seems appropriate.

> exprType(): could be handled similarly

The same, in spades.  Anybody who is asking for "the type"
of a relpartbound is misguided.

> queryjumble.c: JumbleExpr(): whose header comment says:

If somebody needs that, I wouldn't object to adding support there.
But right now it would just be dead code, so why bother?

regards, tom lane




Re: Support tab completion for upper character inputs in psql

2022-01-06 Thread Tom Lane
Peter Eisentraut  writes:
> So the ordering of the suggested completions is different.  I don't know 
> offhand how that ordering is determined.  Perhaps it's dependent on 
> locale, readline version, or operating system.  In any case, we need to 
> figure this out to make this test stable.

I don't think we want to get into the business of trying to make that
consistent across different readline/libedit versions.  How about
adjusting the test case so that only one enum value is to be printed?

regards, tom lane




Re: ICU for global collation

2022-01-06 Thread Julien Rouhaud
On Thu, Jan 06, 2022 at 01:55:55PM +, Finnerty, Jim wrote:
> 
> I didn't notice anything version-specific about the patch.  Would any
> modifications be needed to backport it to pg13 and pg14?

This is a new feature so it can't be backported.  The changes aren't big and
mostly touches places that didn't change in a long time so I don't think that
it would take much effort if you wanted to backport it on your own forks.

> After this patch goes in, the big next thing would be to support
> nondeterministic collations for LIKE, ILIKE and pattern matching operators in
> general.  Is anyone interested in working on that?

As far as I know you're the last person that seemed to be working on that topic
back in March :)




Re: SQL:2011 application time

2022-01-06 Thread Vik Fearing
On 1/5/22 11:03 PM, Corey Huinker wrote:
> 
> There was similar work being done for system periods, which are a bit
> simpler but require a side (history) table to be created.

This is false.  SYSTEM_TIME periods do not need any kind of history.
This was one of the problems I had with Surafel's attempt because it was
confusing the period with SYSTEM VERSIONING.  Versioning needs the
period but the inverse is not true.
-- 
Vik Fearing




Re: sqlsmith: ERROR: XX000: bogus varno: 2

2022-01-06 Thread Robert Haas
On Mon, Dec 20, 2021 at 4:20 PM Tom Lane  wrote:
> pg_get_expr doesn't (or didn't) depend on expression_tree_walker,
> so there wasn't a problem there before.  I am worried that there
> might be other code paths, now or in future, that could try to apply
> expression_tree_walker/mutator to relpartbound trees, which is
> why I think it's a reasonable idea to teach them about such trees.

I agree that doing so is totally reasonable. I merely don't think that
previous failure to do so makes anyone a "bozo". It was far from
obvious that it was required.

> It's only a problem if you hold the opinion that there should be
> no user-reachable ERRCODE_INTERNAL_ERROR errors.  Which is a fine
> ideal, but I fear we're a pretty long way off from that.

I do hold that opinion, and I think we ought to work in that direction
even if we can't hope to get there quickly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Refactoring of compression options in pg_basebackup

2022-01-06 Thread Robert Haas
On Thu, Jan 6, 2022 at 12:04 AM Michael Paquier  wrote:
> Yeah.  There are cases for both.  I just got to wonder whether it
> makes sense to allow both server-side and client-side compression to
> be used at the same time.  That would be a rather strange case, but
> well, with the correct set of options that could be possible.

I don't think it makes sense to support that. On the one hand, it
doesn't seem useful: compressing already-compressed data doesn't
usually work very well. Alternatively, I suppose the intent could be
to compress one way for transfer and then decompress and recompress
for storage, but that seems too inefficient to take seriously. On the
other hand, it requires a more complex user interface, and it's
already fairly complicated anyway.

> My view of things is slightly different, aka I'd rather keep
> --compress to mean a compression level with an integer option, but
> introduce a --compression-method={lz4,gzip,none}, with -Z being a
> synonym of --compression-method=gzip.  That's at least the path we
> chose for pg_receivewal.  I don't mind sticking with one way or
> another, as what you are proposing is basically the same thing I have
> in mind, but both tools ought to use the same set of options.

Did you mean that -z would be a synonym for --compression-method=gzip?
It doesn't really make sense for -Z to be that, unless it's also
setting the compression level.

My objection to --compress=$LEVEL is that the compression level seems
like it ought to rightfully be subordinate to the choice of algorithm.
In general, there's no reason why a compression algorithm has to offer
a choice of compression levels at all, or why they have to be numbered
0 through 9. For example, lz4 on my system claims to offer compression
levels from 1 through 12, plus a separate set of "fast" compression
levels starting with 1 and going up to an unspecified number. And then
it also has options to favor decompression speed, change the block
size, and a few other parameters. We don't necessarily want to expose
all of those options, but we should structure things so that we could
if it became important. The way to do that is to make the compression
algorithm the primary setting, and then anything else you can set for
that compressor is somehow a subordinate setting.

Put another way, we don't decide first that we want to compress with
level 7, and then afterward decide whether that's gzip, lz4, or bzip2.
We pick the compressor first, and then MAYBE think about changing the
compression level.

> > In the proposed patch, you end up with pg_basebackup
> > --compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
> > find that quite odd, though as with all such things, opinions may
> > vary. In my proposal, that would be an error, because it would be
> > equivalent to --compress=lz4 --compress=gzip --compression-level=2,
> > and would thus involve conflicting compression method specifications.
>
> It seems to me that you did not read the patch closely enough.  The
> attached patch does not add support for LZ4 in pg_basebackup on the
> client-side yet.  Once it gets added, though, the idea is that  using
> --compress with LZ4 would result in an error.  That's what happens
> with pg_receivewal on HEAD, for one.  The patch just shapes things to
> plug LZ4 more easily in the existing code of pg_basebackup.c, and
> walmethods.c.

Well what I was looking at was this:

- printf(_("  -Z, --compress=0-9 compress tar output with given
compression level\n"));
+ printf(_("  -Z, --compress=1-9 compress tar output with given
compression level\n"));
+ printf(_("  --compression-method=METHOD\n"
+ " method to compress data\n"));

That seems to show that, post-patch, the argument to -Z would be a
compression level, even if --compression-method were something other
than gzip.

It's possible that I haven't read something carefully enough, but to
me, what I said seems to be a straightforward conclusion based on
looking at the usage help in the patch. So if I came to the wrong
conclusion, perhaps that usage help isn't reflecting the situation you
intend to create, or not as clearly as it ought.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-06 Thread Simon Riggs
On Thu, 6 Jan 2022 at 13:15, Finnerty, Jim  wrote:
>
> (Maxim) Re: -- If after upgrade page has no free space for special data, 
> tuples are
>converted to "double xmax" format: xmin became virtual
>FrozenTransactionId, xmax occupies the whole 64bit.
>Page converted to new format when vacuum frees enough space.
>
> A better way would be to prepare the database for conversion to the 64-bit 
> XID format before the upgrade so that it ensures that every page has enough 
> room for the two new epochs (E bits).

Most software has a one-stage upgrade model. What you propose would
have us install 2 things, with a step in-between, which makes it
harder to manage.

> 1. Enforce the rule that no INSERT or UPDATE to an existing page will leave 
> less than E bits of free space on a heap page
>
> 2. Run an online and restartable task, analogous to pg_repack, that rewrites 
> and splits any page that has less than E bits of free space. This needs to be 
> run on all non-temp tables in all schemas in all databases.  DDL operations 
> are not allowed on a target table while this operation runs, which is 
> enforced by taking an ACCESS SHARE lock on each table while the process is 
> running. To mitigate the effects of this restriction, the restartable task 
> can be restricted to run only in certain hours.  This could be implemented as 
> a background maintenance task that runs for X hours as of a certain time of 
> day and then kicks itself off again in 24-X hours, logging its progress.
>
> When this task completes, the database is ready for upgrade to 64-bit XIDs, 
> and there is no possibility that any page has insufficient free space for the 
> special data.
>
> Would you agree that this approach would completely eliminate the need for a 
> "double xmax" representation?

I agree about the idea for scanning existing data blocks, but why not
do this AFTER upgrade?

1. Upgrade, with important aspect not-enabled-yet, but everything else
working - all required software is delivered in one shot, with fast
upgrade
2. As each table is VACUUMed, we confirm/clean/groom data blocks so
each table is individually confirmed as being ready. The pace that
this happens at is under user control.
3. When all tables have been prepared, then restart to allow xid64 format usage

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Pavel Stehule
čt 6. 1. 2022 v 14:28 odesílatel Joel Jacobson  napsal:

> On Thu, Jan 6, 2022, at 10:05, Julien Rouhaud wrote:
> > I agree, but on the other hand I don't see how defining a top level block
> > alias identical for every single plpgsql function would really make
> sense.
> > Not every function has a very long name and would benefit from it, and
> no one
> > can really assume that e.g. "root" or whatever configured name won't be
> used in
> > any plpgsql function on that database or cluster.  So while having some
> global
> > configuration infrastructure can be useful I still don't think that it
> could be
> > used for this purpose.
>
> How about using the existing reserved keyword "in" followed by "." (dot)
> and then the function parameter name?
>
> This idea is based on the assumption "in." would always be a syntax error
> everywhere in both SQL and PL/pgSQL,
> so if we would introduce such a syntax, no existing code could be
> affected, except currently invalid code.
>
> I wouldn't mind using "in." to refer to IN/OUT/INOUT parameters and not
> only IN ones, it's a minor confusion that could be explained in the docs.
>

You are right, in.outvar looks messy. Moreover, maybe the SQL parser can
have a problem with it.

Regards

Pavel




>
> Also, "out." wouldn't work, since "out" is not a reserved keyword.
>
> /Joel
>
>


Re: ICU for global collation

2022-01-06 Thread Finnerty, Jim

I didn't notice anything version-specific about the patch.  Would any 
modifications be needed to backport it to pg13 and pg14?

After this patch goes in, the big next thing would be to support 
nondeterministic collations for LIKE, ILIKE and pattern matching operators in 
general.  Is anyone interested in working on that?

On 1/5/22, 10:36 PM, "Julien Rouhaud"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Tue, Jan 04, 2022 at 05:03:10PM +0100, Peter Eisentraut wrote:
> On 04.01.22 03:21, Julien Rouhaud wrote:
>
> > > - if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
> > > - mylocale = pg_newlocale_from_collation(collid);
> > > + if (!lc_collate_is_c(collid))
> > > + {
> > > + if (collid != DEFAULT_COLLATION_OID)
> > > + mylocale = pg_newlocale_from_collation(collid);
> > > + else if (default_locale.provider == COLLPROVIDER_ICU)
> > > + mylocale = _locale;
> > > + }
> >
> > There are really a lot of places with this new code.  Maybe it could be 
some
> > new function/macro to wrap that for the normal case (e.g. not 
formatting.c)?
>
> Right, we could just put this into pg_newlocale_from_collation(), but the
> comment there says
>
>  * In fact, they shouldn't call this function at all when they are dealing
>  * with the default locale.  That can save quite a bit in hotspots.
>
> I don't know how to assess that.
>
> We could pack this into a macro or inline function if we are concerned 
about
> this.

Yes that was my idea, just have a new function (inline function or a macro
then since pg_newlocale_from_collation() clearly warns about performance
concerns) that have the whole
is-not-c-collation-and-is-default-collation-or-icu-collation logic and calls
pg_newlocale_from_collation() only when needed.





Re: Make relfile tombstone files conditional on WAL level

2022-01-06 Thread Robert Haas
On Thu, Jan 6, 2022 at 3:47 AM Thomas Munro  wrote:
> Another problem is that relfilenodes are normally allocated with
> GetNewOidWithIndex(), and initially match a relation's OID.  We'd need
> a new allocator, and they won't be able to match the OID in general
> (while we have 32 bit OIDs at least).

Personally I'm not sad about that. Values that are the same in simple
cases but diverge in more complex cases are kind of a trap for the
unwary. There's no real reason to have them ever match. Yeah, in
theory, it makes it easier to tell which file matches which relation,
but in practice, you always have to double-check in case the table has
ever been rewritten. It doesn't seem worth continuing to contort the
code for a property we can't guarantee anyway.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-06 Thread Joel Jacobson
On Thu, Jan 6, 2022, at 10:05, Julien Rouhaud wrote:
> I agree, but on the other hand I don't see how defining a top level block
> alias identical for every single plpgsql function would really make sense.
> Not every function has a very long name and would benefit from it, and no one
> can really assume that e.g. "root" or whatever configured name won't be used 
> in
> any plpgsql function on that database or cluster.  So while having some global
> configuration infrastructure can be useful I still don't think that it could 
> be
> used for this purpose.

How about using the existing reserved keyword "in" followed by "." (dot) and 
then the function parameter name?

This idea is based on the assumption "in." would always be a syntax error 
everywhere in both SQL and PL/pgSQL,
so if we would introduce such a syntax, no existing code could be affected, 
except currently invalid code.

I wouldn't mind using "in." to refer to IN/OUT/INOUT parameters and not only IN 
ones, it's a minor confusion that could be explained in the docs.

Also, "out." wouldn't work, since "out" is not a reserved keyword.

/Joel


Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-06 Thread Finnerty, Jim
(Maxim) Re: -- If after upgrade page has no free space for special data, tuples 
are
   converted to "double xmax" format: xmin became virtual
   FrozenTransactionId, xmax occupies the whole 64bit.
   Page converted to new format when vacuum frees enough space.

A better way would be to prepare the database for conversion to the 64-bit XID 
format before the upgrade so that it ensures that every page has enough room 
for the two new epochs (E bits).

1. Enforce the rule that no INSERT or UPDATE to an existing page will leave 
less than E bits of free space on a heap page

2. Run an online and restartable task, analogous to pg_repack, that rewrites 
and splits any page that has less than E bits of free space. This needs to be 
run on all non-temp tables in all schemas in all databases.  DDL operations are 
not allowed on a target table while this operation runs, which is enforced by 
taking an ACCESS SHARE lock on each table while the process is running. To 
mitigate the effects of this restriction, the restartable task can be 
restricted to run only in certain hours.  This could be implemented as a 
background maintenance task that runs for X hours as of a certain time of day 
and then kicks itself off again in 24-X hours, logging its progress.

When this task completes, the database is ready for upgrade to 64-bit XIDs, and 
there is no possibility that any page has insufficient free space for the 
special data.

Would you agree that this approach would completely eliminate the need for a 
"double xmax" representation? 

/Jim




Re: row filtering for logical replication

2022-01-06 Thread Euler Taveira
On Thu, Jan 6, 2022, at 1:18 AM, Amit Kapila wrote:
> On Thu, Jan 6, 2022 at 8:43 AM Peter Smith  wrote:
> >
> > On Wed, Jan 5, 2022 at 9:52 PM Amit Kapila  wrote:
> > >
> > ...
> >
> > > Another minor comment:
> > > +static bool pgoutput_row_filter(enum ReorderBufferChangeType changetype,
> > >
> > > Do we need to specify the 'enum' type before changetype parameter?
> > >
> >
> > That is because there is currently no typedef for the enum
> > ReorderBufferChangeType.
> >
> 
> But I see that the 0002 patch is already adding the required typedef.
IMO we shouldn't reuse ReorderBufferChangeType. For a long-term solution, it is
fragile. ReorderBufferChangeType has values that do not matter for row filter
and it relies on the fact that REORDER_BUFFER_CHANGE_INSERT,
REORDER_BUFFER_CHANGE_UPDATE and REORDER_BUFFER_CHANGE_DELETE are the first 3
values from the enum, otherwise, it breaks rfnodes and no_filters in
pgoutput_row_filter(). I suggest a separate enum that contains only these 3
values.

enum RowFilterPublishAction {
   PUBLISH_ACTION_INSERT,
   PUBLISH_ACTION_UPDATE,
   PUBLISH_ACTION_DELETE
};


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: sqlsmith: ERROR: XX000: bogus varno: 2

2022-01-06 Thread Amit Langote
On Thu, Jan 6, 2022 at 3:43 PM Amit Langote  wrote:
> On Tue, Dec 21, 2021 at 6:20 AM Tom Lane  wrote:
> > Robert Haas  writes:
> > > OK ... but my point is that dump and restore does work. So whatever
> > > cases pg_get_expr() doesn't work must be cases that aren't needed for
> > > that to happen. Otherwise this problem would have been found long ago.
> >
> > pg_get_expr doesn't (or didn't) depend on expression_tree_walker,
> > so there wasn't a problem there before.  I am worried that there
> > might be other code paths, now or in future, that could try to apply
> > expression_tree_walker/mutator to relpartbound trees, which is
> > why I think it's a reasonable idea to teach them about such trees.
> >
> > > I realize that's a deep hole out of which we're unlikely to be able to
> > > climb in the short or even medium term, but we don't have to keep
> > > digging. We either make a rule that pg_get_expr() can apply to
> > > everything stored in the catalogs and produce sensible answers, which
> > > seems to be what you prefer, or we make it return nice errors for the
> > > cases that it can't handle nicely, or some combination of the two. And
> > > whatever we decide, we also document and enforce everywhere.
> >
> > I think having pg_get_expr throw an error for a query, as opposed to an
> > expression, is fine.  What I don't want to do is subdivide things a lot
> > more finely than that; thus lumping "relpartbound" into "expression"
> > seems like a reasonable thing to do.  Especially since we already did it
> > six years ago.
>
> I admit that it was an oversight on my part that relpartbound trees
> are not recognized by nodeFuncs.c. :-(
>
> Thanks for addressing that in the patch you posted.  I guess fixing
> only expression_tree_walker/mutator() suffices for now...

Also, I wondered if it might be a good idea to expand the comment
above NodeTag definition in nodes.h to tell someone adding new types
to also look in nodeFuncs.c to check if any of the functions there
need to be updated.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2022-01-06 Thread Alvaro Herrera
On 2022-Jan-06, Amit Langote wrote:

> On Thu, Jan 6, 2022 at 7:27 AM Alvaro Herrera  wrote:

> > I have pushed it thinking that we would not backpatch any of this fix.
> > However, after running the tests and realizing that I didn't need an
> > initdb for either patch, I wonder if maybe the whole series *is*
> > backpatchable.
> 
> We do lack help from trigger.c in the v12 branch in that there's no
> Trigger.tgisclone, which is used in a couple of places in the fix.  I
> haven't checked how big of a deal it would be to back-port
> Trigger.tgisclone to v12, but maybe that's doable.

Yeah, I realized afterwards that we added tgparentid in 13 only
(b9b408c48), so we should only backpatch to that.

> > There is one possible problem, which is that psql and pg_dump would need
> > testing to verify that they work decently (i.e. no crash, no
> > misbehavior) with partitioned tables created with the original code.
> 
> I suppose you mean checking if the psql and pg_dump after applying
> *0001* work sanely with partitioned tables defined without 0001?

Yes.

> Will test that.

I looked at the backpatch at the last minute yesterday.  The tablecmds.c
conflict is easy to resolve, but the one in pg_dump.c is a giant
conflict zone that I didn't have time to look closely :-(

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




  1   2   >