Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Julien Rouhaud
On Fri, Aug 27, 2021 at 12:41 PM Michael Paquier  wrote:
>
> Perhaps.  That feels like a topic different than what's discussed
> here, though, because we don't really need to check if a bgworker has
> been launched or not.  We just need to make sure that it never runs in
> the context of a binary upgrade, like autovacuum.

I'm a bit confused.  Did you mean checking if a bgworker has been
*registered* or not?

But my point was that preventing a bgworker registration as a way to
avoid it from being launched may lead to some problem if an extensions
decides that a failure in the registration is problematic enough to
prevent the startup altogether for instance.  I'm ok if we decide that
it's *not* an acceptable behavior, but it should be clear that it's
the case, and probably documented.




Re: create table like: ACCESS METHOD

2021-08-26 Thread Michael Paquier
On Tue, Jun 01, 2021 at 02:10:45PM -0500, Justin Pryzby wrote:
> rebased and alphabetized

+   /* ACCESS METHOD doesn't apply and isn't copied for partitioned tables */
+   if ((table_like_clause->options & CREATE_TABLE_LIKE_ACCESS_METHOD) != 0 &&
+   !cxt->ispartitioned)
+   cxt->accessMethod = get_am_name(relation->rd_rel->relam);
I was thinking about an ERROR here, but all the other options do the
work when specified only if required, so that's fine.  We should have
a test with a partitioned table and the clause specified, though.

+ 
+  The table's access method will be copied.  By default, the
+  default_table_access_method is used.
+ 
Why is there any need to mention default_table_access_method?  This
just inherits the AM from the source table, which has nothing to do
with the default directly.

+CREATE ACCESS METHOD heapdup TYPE TABLE HANDLER heap_tableam_handler;
+CREATE TABLE likeam() USING heapdup;
+CREATE TABLE likeamlike(LIKE likeam INCLUDING ALL);
Rather than creating a custom AM in this test path, I would be
tempted to move that to create_am.sql.
--
Michael


signature.asc
Description: PGP signature


Re: Failure of subscription tests with topminnow

2021-08-26 Thread Amit Kapila
On Thu, Aug 26, 2021 at 4:28 PM Ajin Cherian  wrote:
>
> On Thu, Aug 26, 2021 at 2:45 PM Masahiko Sawada  wrote:
>
> Attaching a patch on head that modifies this particular script to also
> consider the state of the walsender.
>

I think the fix is correct but similar changes are required in
022_twophase_cascade.pl as well (search for $oldpid in tests). I am
not completely sure but I think it is better to make this test change
in back branches as well to make it stable and reduce such random
build farm failures.

-- 
With Regards,
Amit Kapila.




Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-26 Thread Peter Geoghegan
On Wed, Aug 25, 2021 at 5:02 PM Peter Geoghegan  wrote:
> > I like it better than the current layout, so +1.
>
>  This seems like a release housekeeping task to me. I'll come up with
> a patch targeting 14 and master in a few days.

Here is a patch that outputs log_autovacuum's lines in this order:

LOG:  automatic vacuum of table "regression.public.foo": index scans: 1
pages: 9600 removed, 0 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 2169423 removed, 0 remain, 0 are dead but not yet
removable, oldest xmin: 731
index "foo_pkey": pages: 5951 in total, 5947 newly deleted,
5947 currently deleted, 0 reusable
I/O timings: read: 75.394 ms, write: 76.980 ms
avg read rate: 103.349 MB/s, avg write rate: 73.317 MB/s
buffer usage: 47603 hits, 32427 misses, 23004 dirtied
WAL usage: 46607 records, 1 full page images, 15841331 bytes
system usage: CPU: user: 1.18 s, system: 0.23 s, elapsed: 2.45 s

I'll commit this in a day or two, backpatching to 14. Barring any objections.

Thanks
-- 
Peter Geoghegan


v1-0001-Reorder-log_autovacuum-output.patch
Description: Binary data


RE: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)

2021-08-26 Thread houzj.f...@fujitsu.com
On Fri, Aug 27, 2021 1:25 PM Dilip Kumar   wrote:
> On Thu, Aug 26, 2021 at 2:10 PM Amit Kapila  
> wrote:
> 
>> Thanks, Sawada-San and Dilip for confirmation. I would like to commit
>> this and the second patch (the second one still needs some more
>> testing and review) for PG-15 as there is no bug per-se related to
>> this work in PG-14 but I see an argument to commit this for PG-14 to
>> keep the code (APIs) consistent. What do you think? Does anybody else
>> have any opinion on this?

> IMHO, this is a fair amount of refactoring and this is actually an
> improvement patch so we should push it to v15.

+1

Best regards,
Hou zj


RE: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)

2021-08-26 Thread houzj.f...@fujitsu.com
On Thu, Aug 26, 2021 2:18 PM Dilip Kumar  wrote:

> The patch looks good to me, I have rebased 0002 atop
> this patch and also done some cosmetic fixes in 0002. 

Here are some comments for the 0002 patch.

1)

- * toplevel transaction. Each subscription has a separate set of files.
+ * toplevel transaction. Each subscription has a separate files.

a separate files => a separate file

2)
+* Otherwise, just open it file for writing, in append mode.
 */

open it file => open the file


3)
if (subxact_data.nsubxacts == 0)
{
-   if (ent->subxact_fileset)
-   {
-   cleanup_subxact_info();
-   FileSetDeleteAll(ent->subxact_fileset);
-   pfree(ent->subxact_fileset);
-   ent->subxact_fileset = NULL;
-   }
+   cleanup_subxact_info();
+   BufFileDeleteFileSet(stream_fileset, path, true);
+


Before applying the patch, the code only invoke cleanup_subxact_info() when the
file exists. After applying the patch, it will invoke cleanup_subxact_info()
either the file exists or not. Is it correct ?


4)
/*
-* If this is the first streamed segment, the file must not exist, so 
make
-* sure we're the ones creating it. Otherwise just open the file for
-* writing, in append mode.
+* If this is the first streamed segment, create the changes file.
+* Otherwise, just open it file for writing, in append mode.
 */
if (first_segment)
-   {
...
-   if (found)
-   ereport(ERROR,
-   (errcode(ERRCODE_PROTOCOL_VIOLATION),
-errmsg_internal("incorrect 
first-segment flag for streamed replication transaction")));
...
-   }
+   stream_fd = BufFileCreateFileSet(stream_fileset, path);


Since the function BufFileCreateFileSet() doesn't check the file's existence,
the change here seems remove the file existence check which the old code did.


Best regards,
Hou zj


Re: Error code for checksum failure in origin.c

2021-08-26 Thread Kasahara Tatsuhito
On Fri, Aug 27, 2021 at 1:32 PM Amit Kapila  wrote:
>
> On Thu, Aug 26, 2021 at 4:11 PM Daniel Gustafsson  wrote:
> >
> > > On 26 Aug 2021, at 12:03, Amit Kapila  wrote:
> > >
> > > On Thu, Aug 26, 2021 at 3:18 PM Kasahara Tatsuhito
> > >  wrote:
> > >>
> > >> Hi.
> > >>
> > >> In the code in src/backend/replication/logical/origin.c,
> > >> the error code "ERRCODE_CONFIGURATION_LIMIT_EXCEEDED" is given
> > >> when a checksum check results in an error,
> > >> but "ERRCODE_ DATA_CORRUPTED" seems to be more appropriate.
> > >>
> > >
> > > +1. Your observation looks correct to me and the error code suggested
> > > by you seems appropriate. We use ERRCODE_DATA_CORRUPTED in
> > > ReadTwoPhaseFile() for similar error.
> >
> > Agreed, +1 for changing this.
> >
>
> I think we need to backpatch this till 9.6 as this is introduced by
> commit 5aa2350426.
+1

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)

2021-08-26 Thread Dilip Kumar
On Thu, Aug 26, 2021 at 2:10 PM Amit Kapila  wrote:

>
> Thanks, Sawada-San and Dilip for confirmation. I would like to commit
> this and the second patch (the second one still needs some more
> testing and review) for PG-15 as there is no bug per-se related to
> this work in PG-14 but I see an argument to commit this for PG-14 to
> keep the code (APIs) consistent. What do you think? Does anybody else
> have any opinion on this?
>
>
IMHO, this is a fair amount of refactoring and this is actually an
improvement patch so we should push it to v15.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Error code for checksum failure in origin.c

2021-08-26 Thread Michael Paquier
On Fri, Aug 27, 2021 at 10:02:26AM +0530, Amit Kapila wrote:
> I think we need to backpatch this till 9.6 as this is introduced by
> commit 5aa2350426. Any objections to that?

None.
--
Michael


signature.asc
Description: PGP signature


Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

2021-08-26 Thread Michael Paquier
On Wed, Aug 25, 2021 at 01:20:03AM -0700, Andres Freund wrote:
> On 2021-08-25 12:51:58 +0900, Michael Paquier wrote:
> As I said before, this ship has long sailed:
> 
> typedef struct PgStat_MsgTabstat
> {
>   PgStat_MsgHdr m_hdr;
>   Oid m_databaseid;
>   int m_nentries;
>   int m_xact_commit;
>   int m_xact_rollback;
>   PgStat_Counter m_block_read_time;   /* times in microseconds */
>   PgStat_Counter m_block_write_time;
>   PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
> } PgStat_MsgTabstat;

Well, I kind of misread what you meant upthread then.
PgStat_MsgTabstat has a name a bit misleading, especially if you
assign connection stats to it.

>> As of the two problems discussed on this thread, aka the increased
>> number of UDP packages and the extra timestamp computations, it seems
>> to me that we had better combine the following ideas for HEAD and 14,
>> for now:
>> - Avoid the extra timestamp computation as proposed by Laurenz in [1]
>> - Throttle the frequency where the connection stat packages are sent,
>> as of [2].
> 
> I think in that case we'd have to do the bigger redesign and move "live"
> connection stats to backend_status.c...

Hmm.  A redesign is not really an option for 14 at this stage.  And I
am not really comfortable with the latest proposal from upthread to
plug in that to pgstat_send_tabstat to report things once per
transaction, either.  It really looks like this needs more thoughts,
and it would mean that a revert may be the most appropriate choice
for the moment.  That's the last-resort option, surely, but we are
post-beta3 so there is no much margin left.
--
Michael


signature.asc
Description: PGP signature


Fix erroneous parallel execution when modifying CTE is present in rewritten query

2021-08-26 Thread Greg Nancarrow
Hi Hackers,

There is a known bug in the query rewriter: if a query that has a modifying
CTE is re-written, the hasModifyingCTE flag is not getting set in the
re-written query. This bug can result in the query being allowed to execute
in parallel-mode, which results in an error.

For more details from a previous discussion about this, and a test case
that illustrates the issue, refer to:
https://postgr.es/m/CAJcOf-fAdj=ndkmsrhqzndm-o13ny4dl6xgcevdx5xvbbi0...@mail.gmail.com


As a proposal to fix this problem, I've attached a patch which:

1) Copies the associated hasModifyingCTE and hasRecursive flags when the
rewriter combines CTE lists (using Tom Lane's initial patch code seen in
[1]). This flag copying is missing from the current Postgres code.
2) Adds an error case to specifically disallow the case of applying an
INSERT...SELECT rule action to a command with a data-modifying CTE. This is
because in this case, the rewritten query will actually end up having a
data-modifying CTE that is not at the top level (as it is associated with
the SELECT subquery part), which is not actually allowed by Postgres if
that query is entered normally (as it's the parser that contains the
error-check to ensure that the modifying CTE is at the top level, so this
case avoids detection in the rewriter).
3) Modifies the existing test case in with.sql that tests the merging of an
outer CTE with a CTE in a rule action (as currently that rule action is
using INSERT...SELECT).


For the record, a workaround for this issue (at least addressing how
hasModifyingCTE is meant to exclude the query from parallel execution) has
been suggested in the past, but was not well received. It is the following
addition to the max_parallel_hazard_walker() function:

+ /*
+ * ModifyingCTE expressions are treated as parallel-unsafe.
+ *
+ * XXX Normally, if the Query has a modifying CTE, the
hasModifyingCTE
+ * flag is set in the Query tree, and the query will be
regarded as
+ * parallel-usafe. However, in some cases, a re-written query
with a
+ * modifying CTE does not have that flag set, due to a bug in
the query
+ * rewriter. The following else-if is a workaround for this
bug, to detect
+ * a modifying CTE in the query and regard it as
parallel-unsafe. This
+ * comment, and the else-if block immediately below, may be
removed once
+ * the bug in the query rewriter is fixed.
+ */
+ else if (IsA(node, CommonTableExpr))
+ {
+CommonTableExpr *cte = (CommonTableExpr *)
node;
+Query   *ctequery = castNode(Query,
cte->ctequery);
+
+if (ctequery->commandType != CMD_SELECT)
+{
+   context->max_hazard =
PROPARALLEL_UNSAFE;
+   return true;
+}
+ }
+


Regards,
Greg Nancarrow
Fujitsu Australia


v1-0001-Propagate-CTE-property-flags-when-copying-a-CTE-list.patch
Description: Binary data


Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Michael Paquier
On Fri, Aug 27, 2021 at 11:25:19AM +0800, Julien Rouhaud wrote:
> Right now AFAICT there's no official API to check if a call to
> RegisterBackgroundWorker() succeeded or not, but an extension could
> check by itself using BackgroundWorkerList in bgworker_internals.h,
> and error out or something if it didn't succeed, as a way to inform
> users that they didn't configure the instance properly (like maybe
> increasing max_worker_processes).  Surely using a *_internals.h header
> is a clear sign that you expose yourself to problems, but adding an
> official way to check for bgworker registration doesn't seem
> unreasonable to me.  Is that worth the risk to have pg_upgrade
> erroring out in this kind of scenario, or make the addition of a new
> API to check for registration status more difficult?

Perhaps.  That feels like a topic different than what's discussed
here, though, because we don't really need to check if a bgworker has
been launched or not.  We just need to make sure that it never runs in
the context of a binary upgrade, like autovacuum.
--
Michael


signature.asc
Description: PGP signature


Re: Skipping logical replication transactions on subscriber side

2021-08-26 Thread Amit Kapila
On Thu, Aug 26, 2021 at 6:24 PM Masahiko Sawada  wrote:
>
> On Thu, Aug 26, 2021 at 9:11 PM Amit Kapila  wrote:
> >
> >
> > Okay, changed accordingly. Additionally, I have changed the code which
> > sets timestamp to (unset) when it is 0 so that it won't display the
> > timestamp in that case. I have made few other cosmetic changes in the
> > attached patch. See and let me know what you think of it?
>
> Thank you for the patch!
>
> Agreed with these changes. The patch looks good to me.
>

Pushed, feel free to rebase and send the remaining patch set.

-- 
With Regards,
Amit Kapila.




Re: Error code for checksum failure in origin.c

2021-08-26 Thread Amit Kapila
On Thu, Aug 26, 2021 at 4:11 PM Daniel Gustafsson  wrote:
>
> > On 26 Aug 2021, at 12:03, Amit Kapila  wrote:
> >
> > On Thu, Aug 26, 2021 at 3:18 PM Kasahara Tatsuhito
> >  wrote:
> >>
> >> Hi.
> >>
> >> In the code in src/backend/replication/logical/origin.c,
> >> the error code "ERRCODE_CONFIGURATION_LIMIT_EXCEEDED" is given
> >> when a checksum check results in an error,
> >> but "ERRCODE_ DATA_CORRUPTED" seems to be more appropriate.
> >>
> >
> > +1. Your observation looks correct to me and the error code suggested
> > by you seems appropriate. We use ERRCODE_DATA_CORRUPTED in
> > ReadTwoPhaseFile() for similar error.
>
> Agreed, +1 for changing this.
>

I think we need to backpatch this till 9.6 as this is introduced by
commit 5aa2350426. Any objections to that?

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Tab completion for ALTER TABLE … ADD …

2021-08-26 Thread Michael Paquier
On Tue, Aug 03, 2021 at 12:48:38PM +0100, Dagfinn Ilmari Mannsåker wrote:
> The other day I noticed that there's no tab completion after ALTER TABLE
> … ADD, so here's a patch.  In addition to COLUMN and all the table
> constraint types, it also completes with the list of unique indexes on
> the table after ALTER TABLE … ADD … USING INDEX.

I was reading this patch (not actually tested), and that's a clear
improvement.  One extra thing that could be done here is to complete
with types for a ALTER TABLE ADD COLUMN foo.  We could as well have a
list of columns after UNIQUE or PRIMARY KEY, but that feels like extra
cream on top of the cake.  In short I am fine with what you have
here.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-08-26 Thread Michael Paquier
On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:
> Following the discussion at [1], I refactored the implementation into 
> streamutil and added a third patch making use of it in pg_basebackup itself 
> in 
> order to fail early if the replication slot doesn't exist, so please find 
> attached v2 for that.

Thanks for the split.  That helps a lot.

+
+
 /*
  * Run IDENTIFY_SYSTEM through a given connection and give back to caller

The patch series has some noise diffs here and there, you may want to
clean up that for clarity.

+   if (slot == NULL || !slot->in_use)
+   {
+   LWLockRelease(ReplicationSlotControlLock);
+
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
LWLocks are released on ERROR, so no need for LWLockRelease() here.

+
+  
+  Read information about the named replication slot.  This is
useful to determine which WAL location we should be asking the server
to start streaming at.

A nit.  You may want to be more careful with the indentation of the
documentation.  Things are usually limited in width for readability.
More  markups would be nice for the field names used in the
descriptions.

+   if (slot == NULL || !slot->in_use)  

 [...]
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("replication slot \"%s\" does not exist",
+   cmd->slotname)));
[...]
+   if (PQntuples(res) == 0)
+   {
+   pg_log_error("replication slot %s does not exist", slot_name);
+   PQclear(0);
+   return false;
So, the backend and ReadReplicationSlot() report an ERROR if a slot
does not exist but pg_basebackup's GetSlotInformation() does the same
if there are no tuples returned.  That's inconsistent.  Wouldn't it be
more instinctive to return a NULL tuple instead if the slot does not
exist to be able to check after real ERRORs in frontends using this
interface?  A slot in use exists, so the error is a bit confusing here
anyway, no?

+* XXX: should we allow the caller to specify which target timeline it wants
+* ?
+*/
What are you thinking about here?

-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full../
Typo.

+   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_lsn_timeline",
+ INT4OID, -1, 0);
+   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, 
"confirmed_flush_lsn_timeline",
+ INT4OID, -1, 0);
I would call these restart_tli and confirmed_flush_tli., without the
"lsn" part.

The patch for READ_REPLICATION_SLOT could have some tests using a
connection that has replication=1 in some TAP tests.  We do that in
001_stream_rep.pl with SHOW, as one example.

-   'slot0'
+   'slot0', '-p',
+   "$port"
Something we are missing here?
--
Michael


signature.asc
Description: PGP signature


Re: row filtering for logical replication

2021-08-26 Thread Amit Kapila
On Fri, Aug 27, 2021 at 3:31 AM Peter Smith  wrote:
>
> On Thu, Aug 26, 2021 at 9:13 PM Amit Kapila  wrote:
> >
> > On Thu, Aug 26, 2021 at 3:41 PM Peter Smith  wrote:
> >
> > Apart from Truncate, it will also be a waste if any error happens
> > before actually evaluating the filter, tomorrow there could be other
> > operations like replication of sequences (I have checked that proposed
> > patch for sequences uses get_rel_sync_entry) where we don't need to
> > build ExprState (as filters might or might not be there). So, it would
> > be better to avoid cache lookups in those cases if possible. I still
> > think doing expensive things like preparing expressions should ideally
> > be done only when it is required.
>
> OK. Per your suggestion, I will try to move as much of the row-filter
> cache code as possible out of the get_rel_sync_entry function and into
> the pgoutput_row_filter function.
>

I could think of more scenarios where doing this work in
get_rel_sync_entry() could cost us without any actual need for it.
Consider, the user has published only 'update' and 'delete' operation
for a publication, then in the system there are inserts followed
truncate or any ddl which generates invalidation, for such a case, for
each change we need to rebuild the row_filters but we won't use it.
Similarly, this can happen in any other combination of DML and DDL
operations where the DML operation is not published. I don't want to
say that this is the most common scenario but it is important to do
expensive work when it is actually required, otherwise, there could be
cases where it might hit us.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Julien Rouhaud
On Fri, Aug 27, 2021 at 10:02 AM Michael Paquier  wrote:
>
> On Fri, Aug 27, 2021 at 09:34:24AM +0800, Julien Rouhaud wrote:
> > That shouldn't lead to any problem right?
>
> Well, bgworker_should_start_now() does not exist for that, and
> RegisterBackgroundWorker() is the one doing the classification job, so
> it would be more consistent to keep everything under control in the
> same code path.

I'm not sure it's that uncontroversial.  The way I see
RegisterBackgroundWorker() is that it's responsible for doing some
sanity checks to see if the module didn't make any error and if
ressources are available.  Surely checking for IsBinaryUpgrade should
not be the responsibility of third-party code, so the question is
whether binary upgrade is seen as a resource and as such a reason to
forbid bgworker registration, in opposition to forbid the launch
itself.

Right now AFAICT there's no official API to check if a call to
RegisterBackgroundWorker() succeeded or not, but an extension could
check by itself using BackgroundWorkerList in bgworker_internals.h,
and error out or something if it didn't succeed, as a way to inform
users that they didn't configure the instance properly (like maybe
increasing max_worker_processes).  Surely using a *_internals.h header
is a clear sign that you expose yourself to problems, but adding an
official way to check for bgworker registration doesn't seem
unreasonable to me.  Is that worth the risk to have pg_upgrade
erroring out in this kind of scenario, or make the addition of a new
API to check for registration status more difficult?




Re: [PoC] Federated Authn/z with OAUTHBEARER

2021-08-26 Thread Michael Paquier
On Thu, Aug 26, 2021 at 04:13:08PM +, Jacob Champion wrote:
> Do you mean heap-based? i.e. destroyJsonLexContext() does an
> unnecessary copy before free? Yeah, in that case it's not super useful,
> but I think I'd want some evidence that the performance hit matters
> before optimizing it.

As an authentication code path, the impact is minimal and my take on
that would be to keep the code simple.  Now if you'd really wish to
stress that without relying on the backend, one simple way is to use
pgbench -C -n with a mostly-empty script (one meta-command) coupled
with some profiling.
--
Michael


signature.asc
Description: PGP signature


Re: cannot access to postgres-git via ssh

2021-08-26 Thread Kyotaro Horiguchi
At Thu, 26 Aug 2021 16:34:25 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Hello.

Hmm. I found w...@postgresql.org more appropriate place to ask this
question.

Please ignore this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Michael Paquier
On Fri, Aug 27, 2021 at 09:34:24AM +0800, Julien Rouhaud wrote:
> That shouldn't lead to any problem right?

Well, bgworker_should_start_now() does not exist for that, and
RegisterBackgroundWorker() is the one doing the classification job, so
it would be more consistent to keep everything under control in the
same code path.
--
Michael


signature.asc
Description: PGP signature


Re: amcheck/verify_heapam doesn't check for interrupts

2021-08-26 Thread Peter Geoghegan
Patch committed.

Thanks!
-- 
Peter Geoghegan




Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Julien Rouhaud
On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier  wrote:
>
> Indeed, there is some history here with autovacuum.  I have not been
> careful enough to check that.  Still, putting a check on
> IsBinaryUpgrade in bgworker_should_start_now() would mean that we
> still keep track of the set of bgworkers registered in shared memory.

That shouldn't lead to any problem right?

> Wouldn't it be better to block things at the source, as of
> RegisterBackgroundWorker()?  And that would keep track of the control
> we have on bgworkers in a single place.  I also think that we'd better
> document something about that either in bgworker.sgml or pg_upgrade's
> page.

I'm fine with that approach too.




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-26 Thread Julien Rouhaud
On Fri, Aug 27, 2021 at 3:42 AM Magnus Hagander  wrote:
>
> Yeah, but that does move the problem to the other side doesn't it? So
> if you (as a pure test of course) were to remove the variable
> completely from the included header and just declare it manually with
> PGDLLSPEC in your file, it should work?
>
> Ugly as it is, I wonder if there's a chance we could just process all
> the headers at install times and inject the PGDLLIMPORT. We know which
> symvols it is on account of what we're getting in the DEF file.
>
> Not saying that's not a very ugly solution, but it might work?

It's apparently not enough.  I tried with autovacuum_max_workers GUC,
and it still errors out.
If I add a PGDLLIMPORT, there's a link error when trying to access the variable:
error LNK2019: unresolved external symbol __imp_autovacuum_max_workers
referenced in function...

I think that it means that msvc tries to link that to a DLL while it's
expected to be in postgres.lib as the __imp_ prefix is added in that
case.

If I use PGDLLEXPORT I simply get:
error LNK2001: unresolved external symbol aytovacuum_max_workers




Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-26 Thread Michael Paquier
On Wed, Aug 25, 2021 at 08:23:04PM -0400, Alvaro Herrera wrote:
> On 2021-Aug-25, Peter Geoghegan wrote:
>>  This seems like a release housekeeping task to me. I'll come up with
>> a patch targeting 14 and master in a few days.
> 
> Agreed, thanks.

Sorry for the late reply here.  Indeed, I can see your point to move
the buffer usage a bit down, grouped with the other information
related to I/O.  Moving down this information gives the attached.  If
you wish to do that yourself, that's fine by me, of course.

Saying this, an ANALYZE-only command does amvacuumcleanup() for all
the indexes and the stats exist.  I am not saying that we should do
that for 14 as that's too late, but we could consider adding the index
information also in this case in 15~?
--
Michael
diff --git a/src/backend/access/heap/vacuumlazy.c 
b/src/backend/access/heap/vacuumlazy.c
index 334d8a2aa7..a40d4943a3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -796,11 +796,6 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 (long long) 
vacrel->new_rel_tuples,
 (long long) 
vacrel->new_dead_tuples,
 OldestXmin);
-   appendStringInfo(,
-_("buffer usage: %lld 
hits, %lld misses, %lld dirtied\n"),
-(long long) 
VacuumPageHit,
-(long long) 
VacuumPageMiss,
-(long long) 
VacuumPageDirty);
if (vacrel->rel_pages > 0)
{
BlockNumber orig_rel_pages;
@@ -844,6 +839,11 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 
istat->pages_deleted,
 
istat->pages_free);
}
+   appendStringInfo(,
+_("buffer usage: %lld 
hits, %lld misses, %lld dirtied\n"),
+(long long) 
VacuumPageHit,
+(long long) 
VacuumPageMiss,
+(long long) 
VacuumPageDirty);
appendStringInfo(, _("avg read rate: %.3f MB/s, avg 
write rate: %.3f MB/s\n"),
 read_rate, write_rate);
if (track_io_timing)


signature.asc
Description: PGP signature


Re: amcheck/verify_heapam doesn't check for interrupts

2021-08-26 Thread Mark Dilger


> On Aug 26, 2021, at 4:41 PM, Mark Dilger  wrote:
> 
>> Seems that way. Want to post a patch?
> 
> Sure.



v1-0001-Add-CHECK_FOR_INTERRUPTS-to-verify_heapam.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Mark all GUC variable as PGDLLIMPORT

2021-08-26 Thread Michael Paquier
On Thu, Aug 26, 2021 at 05:10:39PM -0400, Andrew Dunstan wrote:
> On 8/26/21 3:57 PM, Robert Haas wrote:
>> On Thu, Aug 26, 2021 at 3:42 PM Magnus Hagander  wrote:
>>> Ugly as it is, I wonder if there's a chance we could just process all
>>> the headers at install times and inject the PGDLLIMPORT. We know which
>>> symvols it is on account of what we're getting in the DEF file.
>>> Not saying that's not a very ugly solution, but it might work?

I missed the word "install" first here :)

>> If it's ugly, that might mean it's a bad idea and we shouldn't do it
>> ... but if it can be made not-too-ugly, it would certainly be nice to
>> be able to stop worrying about this.
> 
> How is this going to affect msys builds? No gendef there IIRC. I guess
> some similar procedure might be possible ...

Wouldn't that be needed for cygwin as well?  If we go down to enable
that for a maximum number of parameters, I would really agree for
doing things so as this never gets forgotten for new parameters and we
don't have to discuss the matter anymore.  With all that in mind, that
would mean a new perl script that does the job, callable by both MSVC
and normal make builds.  But we have nothing that does a manipulation
of the installation contents.  And couldn't it be a problem if an
installation is overwritten, updated or upgradedd, where there may be
contents not coming from the in-core build process but from some
extension?
--
Michael


signature.asc
Description: PGP signature


Re: amcheck/verify_heapam doesn't check for interrupts

2021-08-26 Thread Mark Dilger



> On Aug 26, 2021, at 4:39 PM, Peter Geoghegan  wrote:
> 
>> Not any good one that I can see.
> 
> Seems that way. Want to post a patch?

Sure.  I just posted another unrelated patch for amcheck this morning, so it 
seems a good day for it :)

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: amcheck/verify_heapam doesn't check for interrupts

2021-08-26 Thread Peter Geoghegan
On Thu, Aug 26, 2021 at 4:24 PM Mark Dilger
 wrote:
> > On Aug 26, 2021, at 2:38 PM, Peter Geoghegan  wrote:
> > It looks like we neglect to call CHECK_FOR_INTERRUPTS() anywhere
> > inside verify_heapam.c. Is there any reason for this?
>
> Not any good one that I can see.

Seems that way. Want to post a patch?

> > Not sure if pg_amcheck itself is a factor here too -- didn't get that far.
>
> That runs an event loop in the client over multiple checks (heap and/or 
> btree) running in backends, just as reindexdb and vacuumdb do over parallel 
> reindexes and vacuums running in backends.  It should be just as safe to 
> ctrl-c out of pg_amcheck as out of those two.  They all three use 
> fe_utils/cancel.h's setup_cancel_handler(), so I would expect modifying 
> verify_heapam would be enough.

Right. I checked that out myself, after sending my email from earlier.
We don't have any problems when pg_amcheck happens to be verifying a
B-Tree index -- verify_nbtree.c already has CHECK_FOR_INTERRUPTS() at
a few key points.

-- 
Peter Geoghegan




Re: The Free Space Map: Problems and Opportunities

2021-08-26 Thread Peter Geoghegan
On Wed, Aug 25, 2021 at 10:58 AM Robert Haas  wrote:
> Makes sense.

I'm glad that the big picture stuff makes sense to you.

> I think one of the big implementation challenges here is
> coping with the scenario where there's not enough shared memory
> available ... or else somehow making that impossible without reserving
> an unreasonable amount of shared memory.

Yes, it'll definitely be necessary to nail that down.

> If you allowed space for
> every buffer to belong to a different relation and have the maximum
> number of leases and whatever, you'd probably have no possibility of
> OOM, but you'd probably be pre-reserving too much memory.

I hope that we can control the shared memory space overhead by making
it a function of max_connections, plus some configurable number of
relations that get modified within a single transaction. This approach
must behave in the same way when when the number of tables that each
transaction actually modifies is high -- perhaps a transaction that
does this then pays a penalty in WAL logging within the FSM. I think
that that can be made manageable, especially if we can pretty much
impose the cost directly on those transactions that need to modify
lots of relations all at once. (If we can reuse the shared memory over
time it'll help too.)

> I also think
> there are some implementation challenges around locking.

That seems likely.

> You probably
> need some, because the data structure is shared, but because it's
> complex, it's not easy to create locking that allows for good
> concurrency. Or so I think.

My hope is that this design more than makes up for it by relieving
contention in other areas. Like buffer lock contention, or relation
extension lock contention.

> Andres has been working -- I think for years now -- on replacing the
> buffer mapping table with a radix tree of some kind. That strikes me
> as very similar to what you're doing here. The per-relation data can
> then include not only the kind of stuff you're talking about but very
> fundamental things like how long it is and where its buffers are in
> the buffer pool. Hopefully we don't end up with dueling patches.

I agree that there is definitely some overlap. I see no risk of a real
conflict, though. I have mostly been approaching this project as an
effort to fix the locality problems, mostly by looking for fixes to
the BenchmarkSQL workload's problems. I have to admit that the big
picture stuff about exploiting transactional semantics with free space
management is still pretty aspirational. The resource management parts
of my prototype patch are by far the kludgiest parts.

I hope that I can benefit from whatever work Andres has already done
on this, particularly when it comes to managing per-relation metadata
in shared memory.

--
Peter Geoghegan




Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Michael Paquier
On Thu, Aug 26, 2021 at 10:34:48AM -0400, Bruce Momjian wrote:
> On Thu, Aug 26, 2021 at 03:59:49PM +0200, Daniel Gustafsson wrote:
>> Agreed, in this particular case I think there is merit to the idea of 
>> enforcing
>> it in the backend.
> 
> OK, works for me

Indeed, there is some history here with autovacuum.  I have not been
careful enough to check that.  Still, putting a check on
IsBinaryUpgrade in bgworker_should_start_now() would mean that we
still keep track of the set of bgworkers registered in shared memory.

Wouldn't it be better to block things at the source, as of
RegisterBackgroundWorker()?  And that would keep track of the control
we have on bgworkers in a single place.  I also think that we'd better 
document something about that either in bgworker.sgml or pg_upgrade's
page.
--
Michael


signature.asc
Description: PGP signature


Re: amcheck/verify_heapam doesn't check for interrupts

2021-08-26 Thread Mark Dilger



> On Aug 26, 2021, at 2:38 PM, Peter Geoghegan  wrote:
> 
> It looks like we neglect to call CHECK_FOR_INTERRUPTS() anywhere
> inside verify_heapam.c. Is there any reason for this?

Not any good one that I can see.

> Can't we just
> put a CHECK_FOR_INTERRUPTS() at the top of the outermost loop, inside
> verify_heapam()?

I expect we could.

> Not sure if pg_amcheck itself is a factor here too -- didn't get that far.

That runs an event loop in the client over multiple checks (heap and/or btree) 
running in backends, just as reindexdb and vacuumdb do over parallel reindexes 
and vacuums running in backends.  It should be just as safe to ctrl-c out of 
pg_amcheck as out of those two.  They all three use fe_utils/cancel.h's 
setup_cancel_handler(), so I would expect modifying verify_heapam would be 
enough.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







CFM for september commitfest

2021-08-26 Thread Jaime Casanova
Hi everyone,

Do we already have a ${subject}? Otherwise I could offer my self.
If anyone agree, this would be my first time as CFM as I would
appreciate some help.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: row filtering for logical replication

2021-08-26 Thread Peter Smith
On Thu, Aug 26, 2021 at 9:13 PM Amit Kapila  wrote:
>
> On Thu, Aug 26, 2021 at 3:41 PM Peter Smith  wrote:
> >
> > On Thu, Aug 26, 2021 at 3:00 PM Amit Kapila  wrote:
> > >
> > > On Thu, Aug 26, 2021 at 9:51 AM Peter Smith  wrote:
> > > >
> > > > On Thu, Aug 26, 2021 at 1:20 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Thu, Aug 26, 2021 at 7:37 AM Peter Smith  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Aug 25, 2021 at 3:28 PM Amit Kapila 
> > > > > >  wrote:
> > > > > > >
> > > > > > ...
> > > > > > >
> > > > > > > Hmm, I think the gain via caching is not visible because we are 
> > > > > > > using
> > > > > > > simple expressions. It will be visible when we use somewhat 
> > > > > > > complex
> > > > > > > expressions where expression evaluation cost is significant.
> > > > > > > Similarly, the impact of this change will magnify and it will 
> > > > > > > also be
> > > > > > > visible when a publication has many tables. Apart from 
> > > > > > > performance,
> > > > > > > this change is logically correct as well because it would be any 
> > > > > > > way
> > > > > > > better if we don't invalidate the cached expressions unless 
> > > > > > > required.
> > > > > >
> > > > > > Please tell me what is your idea of a "complex" row filter 
> > > > > > expression.
> > > > > > Do you just mean a filter that has multiple AND conditions in it? I
> > > > > > don't really know if few complex expressions would amount to any
> > > > > > significant evaluation costs, so I would like to run some timing 
> > > > > > tests
> > > > > > with some real examples to see the results.
> > > > > >
> > > > >
> > > > > I think this means you didn't even understand or are convinced why the
> > > > > patch has cache in the first place. As per your theory, even if we
> > > > > didn't have cache, it won't matter but that is not true otherwise, the
> > > > > patch wouldn't have it.
> > > >
> > > > I have never said there should be no caching. On the contrary, my
> > > > performance test results [1] already confirmed that caching ExprState
> > > > is of benefit for the millions of times it may be used in the
> > > > pgoutput_row_filter function. My only doubts are in regard to how much
> > > > observable impact there would be re-evaluating the filter expression
> > > > just a few extra times by the get_rel_sync_entry function.
> > > >
> > >
> > > I think it depends but why in the first place do you want to allow
> > > re-evaluation when there is a way for not doing that?
> >
> > Because the current code logic of having the "delayed" ExprState
> > evaluation does come at some cost.
> >
>
> So, now you mixed it with the second point. Here, I was talking about
> the need for correct invalidation but you started discussing when to
> first time evaluate the expression, both are different things.
>
> >  And the cost is -
> > a. Needing an extra condition and more code in the function 
> > pgoutput_row_filter
> > b. Needing to maintain the additional Node list
> >
>
> I am not sure you need (b) above and I think (a) should make the
> overall code look clean.
>
> > If we chose not to implement a delayed ExprState cache evaluation then
> > there would still be a (one-time) ExprState cache evaluation but it
> > would happen whenever get_rel_sync_entry is called (regardless of if
> > pgoputput_row_filter is subsequently called). E.g. there can be some
> > rebuilds of the ExprState cache if the user calls TRUNCATE.
> >
>
> Apart from Truncate, it will also be a waste if any error happens
> before actually evaluating the filter, tomorrow there could be other
> operations like replication of sequences (I have checked that proposed
> patch for sequences uses get_rel_sync_entry) where we don't need to
> build ExprState (as filters might or might not be there). So, it would
> be better to avoid cache lookups in those cases if possible. I still
> think doing expensive things like preparing expressions should ideally
> be done only when it is required.

OK. Per your suggestion, I will try to move as much of the row-filter
cache code as possible out of the get_rel_sync_entry function and into
the pgoutput_row_filter function.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




amcheck/verify_heapam doesn't check for interrupts

2021-08-26 Thread Peter Geoghegan
I just noticed that the new heapam amcheck verification code can take
a very long time to respond to cancellations from pg_amcheck -- I saw
that it took over 2 minutes on a large database on my workstation.

It looks like we neglect to call CHECK_FOR_INTERRUPTS() anywhere
inside verify_heapam.c. Is there any reason for this? Can't we just
put a CHECK_FOR_INTERRUPTS() at the top of the outermost loop, inside
verify_heapam()?

Not sure if pg_amcheck itself is a factor here too -- didn't get that far.

Thanks
-- 
Peter Geoghegan




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-26 Thread Andrew Dunstan


On 8/26/21 3:57 PM, Robert Haas wrote:
> On Thu, Aug 26, 2021 at 3:42 PM Magnus Hagander  wrote:
>> Ugly as it is, I wonder if there's a chance we could just process all
>> the headers at install times and inject the PGDLLIMPORT. We know which
>> symvols it is on account of what we're getting in the DEF file.
>>
>> Not saying that's not a very ugly solution, but it might work?
> If it's ugly, that might mean it's a bad idea and we shouldn't do it
> ... but if it can be made not-too-ugly, it would certainly be nice to
> be able to stop worrying about this.
>

How is this going to affect msys builds? No gendef there IIRC. I guess
some similar procedure might be possible ...


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Async-unsafe functions in signal handlers

2021-08-26 Thread Tom Lane
Robert Haas  writes:
> That is a great question. I think bgworker_die() is extremely
> dangerous and ought to be removed. I can't see how that can ever be
> safe.

Agreed, it looks pretty dangerous from here.  The equivalent (and
far better battle-tested) signal handlers in postgres.c are a lot
more circumspect --- they will call stuff that's unsafe per POSIX,
but not from just any interrupt point.

(BTW, I think it's pretty silly to imagine that adding backtrace()
calls inside ereport is making things any more dangerous.  ereport
has pretty much always carried a likelihood of calling malloc(),
for example.)

> FloatExceptionHandler() is a bit different because in theory the
> things that could trigger SIGFPE are relatively limited, and in theory
> we know that those are safe places to ereport(). But I'm not very
> convinced that this is really true. Among other things, kill -FPE
> could be executed any time, but even aside from that, I doubt we have
> exhaustive knowledge of everything in the code that could trigger a
> floating point exception.

On the one hand, that's theoretically true, and on the other hand,
that code's been like that since the last century and I'm unaware
of any actual problems.  There are not many places in the backend
that do arithmetic that's likely to trigger SIGFPE.  Besides which,
what's the alternative?  I suppose we could SIG_IGN SIGFPE, but
that is almost certainly going to create real problems (i.e. missed
error cases) while removing only hypothetical ones.

(The "DBA randomly issues 'kill -FPE'" scenario can be dismissed,
I think --- surely that is in the category of "when you break it,
you get to keep both pieces".)

The larger subtext here is that just because it's undefined per
POSIX doesn't necessarily mean it's unsafe in our use-pattern.

regards, tom lane




Re: Async-unsafe functions in signal handlers

2021-08-26 Thread Robert Haas
On Wed, Aug 25, 2021 at 10:22 AM Denis Smirnov  wrote:
> I am going to refactor Greenplum backtraces for error messages and want to 
> make it more compatible with PostgreSQL code. Backtraces in PostgreSQL were 
> introduced by 71a8a4f6e36547bb060dbcc961ea9b57420f7190 commit (original 
> discussion 
> https://www.postgresql.org/message-id/CAMsr+YGL+yfWE=jvbubnpwtrrzney7hj07+zt4byjdvp4sz...@mail.gmail.com
>  ) and rely on backtrace() and backtrace_symbols() functions. They are used 
> inside errfinish() that is wrapped by ereport() macros. ereport() is invoked 
> inside bgworker_die() and FloatExceptionHandler() signal handlers. I am 
> confused with this fact - both backtrace functions are async-unsafe: 
> backtrace_symbols() - always, backtrace() - only for the first call due to 
> dlopen. I wonder why does PostgreSQL use async-unsafe functions in signal 
> handlers?

That is a great question. I think bgworker_die() is extremely
dangerous and ought to be removed. I can't see how that can ever be
safe.

FloatExceptionHandler() is a bit different because in theory the
things that could trigger SIGFPE are relatively limited, and in theory
we know that those are safe places to ereport(). But I'm not very
convinced that this is really true. Among other things, kill -FPE
could be executed any time, but even aside from that, I doubt we have
exhaustive knowledge of everything in the code that could trigger a
floating point exception.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 3:42 PM Magnus Hagander  wrote:
> Ugly as it is, I wonder if there's a chance we could just process all
> the headers at install times and inject the PGDLLIMPORT. We know which
> symvols it is on account of what we're getting in the DEF file.
>
> Not saying that's not a very ugly solution, but it might work?

If it's ugly, that might mean it's a bad idea and we shouldn't do it
... but if it can be made not-too-ugly, it would certainly be nice to
be able to stop worrying about this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-26 Thread Magnus Hagander
On Thu, Aug 26, 2021 at 3:38 AM Julien Rouhaud  wrote:
>
> On Thu, Aug 26, 2021 at 1:51 AM Alvaro Herrera  
> wrote:
> >
> > On 2021-Aug-25, Magnus Hagander wrote:
> >
> > > The thing we need the PGDLLIMPORT definition for is to *import* them
> > > on the other end?
> >
> > Oh ... so modules that are willing to cheat can include their own
> > declarations of the variables they need, and mark them __declspec
> > (dllimport)?
>
> I just tried and msvc doesn't like it.  It errors out with a C2370
> error "redefinition; different storage class".  According to
> https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2370
> changing __declspec() on the other side is not possible.

Yeah, but that does move the problem to the other side doesn't it? So
if you (as a pure test of course) were to remove the variable
completely from the included header and just declare it manually with
PGDLLSPEC in your file, it should work?

Ugly as it is, I wonder if there's a chance we could just process all
the headers at install times and inject the PGDLLIMPORT. We know which
symvols it is on account of what we're getting in the DEF file.

Not saying that's not a very ugly solution, but it might work?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: verify_heapam for sequences?

2021-08-26 Thread Mark Dilger


> On Aug 26, 2021, at 3:03 AM, Peter Eisentraut 
>  wrote:
> 
> 
> Is there a reason why contrib/amcheck/verify_heapam.c does not want to run on 
> sequences?  If I take out the checks, it appears to work.  Is this an 
> oversight?  Or if there is a reason, maybe it could be stated in a comment, 
> at least.

Testing the corruption checking logic on all platforms is a bit arduous, 
because the data layout on disk changes with alignment difference, endianness, 
etc.  The work I did with Tom's help finally got good test coverage across the 
entire buildfarm, but that test (contrib/amcheck/t/001_verify_heapam.pl) 
doesn't work for sequences even on my one platform (mac laptop).

I have added a modicum of test coverage for sequences in the attached WIP 
patch, which is enough to detect sequence corruption on my laptop.  It would 
have to be tested across the buildfarm after being extended to cover more 
cases.  As it stands now, it uses blunt force to corrupt the relation, and only 
verifies that verify_heapam() returns some corruption, not that it reports the 
right corruption.

I understand that sequences are really just heap tables, and since we already 
test corrupted heap tables, we could assume that we already have sufficient 
coverage.  I'm not entirely comfortable with that, though, because future patch 
authors who modify how tables or sequences work are not necessarily going to 
think carefully about whether their modifications invalidate that assumption.



v1-0001-WIP-patch-to-support-amcheck-of-sequences.patch.WIP
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 01:24:46PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> > > Yes, we're talking about either incremental (or perhaps differential)
> > > backup where only the files which are actually different would be backed
> > > up.  Just like with PG, I can't provide any complete guarantees that
> > > we'd be able to actually make this possible after a major version with
> > > pgBackRest with this change, but it definitely isn't possible *without*
> > > this change.  I can't see any reason why we wouldn't be able to do a
> > > checksum-based incremental backup though (which would be *much* faster
> > > than a regular backup) once this change is made and have that be a
> > > reliable and trustworthy backup.  I'd want to think about it more and
> > > discuss it with David in some detail before saying if we could maybe
> > > perform a timestamp-based incremental backup (without checksum'ing the
> > > files, as we do in normal situations), but that would really just be a
> > > bonus.
> > 
> > Well, it would be nice to know exactly how it would help pgBackRest if
> > that is one of the reasons we are adding this feature.
> 
> pgBackRest keeps a manifest for every file in the PG data directory that
> is backed up and we identify that file by the filename.  Further, we
> calculate a checksum for every file.  If the filenames didn't change
> then we'd be able to compare the file in the new cluster against the
> file and checksum in the manifest in order to be able to perform the
> incremental/differential backup.  We don't store the inodes in the
> manifest though, and we don't have any concept of looking at multiple
> data directories at the same time or anything like that (which would
> also mean that the old data directory would have to be kept around for
> that to even work, which seems like a good bit of additional
> complication and risk that someone might start up the old cluster by
> accident..).
> 
> That's how it'd be very helpful to pgBackRest for the filenames to be
> preserved across pg_upgrade's.

OK, that is clear.

> > > > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > > > this code for that reason?
> > > > > 
> > > > > That this would help with TDE (of which there seems little doubt...) 
> > > > > is
> > > > > an additional benefit to this.  Specifically, taking the existing work
> > > > > that's already been done to allow block-by-block encryption and
> > > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > > > number as the IV, just like many disk encryption systems do, avoids 
> > > > > the
> > > > > concerns that were brought up about using LSN for the IV with CTR and
> > > > > it's certainly not difficult to do, but it does depend on this change.
> > > > > This was all discussed previously and it sure looks like a sensible
> > > > > approach to use that mirrors what many other systems already do
> > > > > successfully.
> > > > 
> > > > Well, I would think we would not add this for TDE until we were sure
> > > > someone was working on adding TDE.
> > > 
> > > That this would help with TDE is what I'd consider an added bonus.
> > 
> > Not if we have no plans to implement TDE, which was my point.  Why not
> > wait to see if we are actually going to implement TDE rather than adding
> > it now.  It is just so obvious, why do I have to state this?
> 
> There's been multiple years of effort put into implementing TDE and I'm
> sure hopeful that it continues as I'm trying to put effort into moving
> it forward myself.  I'm a bit baffled by the idea that we're just

Well, this is the first time I am hearing this publicly.

> suddenly going to stop putting effort into TDE as it is brought up time
> and time again by clients that I've talked to as one of the few reasons
> they haven't moved to PG yet- I can't believe that hasn't been
> experienced by folks at other organizations too, I mean, there's people
> maintaining forks of PG specifically for TDE ...

Agreed.

> > > I've certainly done it and I'd be kind of surprised if others haven't,
> > > but I've also played a lot with pg_dump in various modes, so perhaps
> > > that's not a great representation.  I've definitely had to explain to
> > > clients why there's a whole different set of filenames after a
> > > pg_upgrade and why that is the case for an 'in place' upgrade before
> > > too.
> > 
> > Uh, so I guess I am right that few people have mentioned this in the
> > past.  Why were users caring about the file names?
> 
> This is a bit baffling to me.  Users and admins certainly care about
> what files their data is stored in and knowing how to find them.
> Covering the data directory structure is a commonly asked for part of
> the training that I regularly do for clients.

I just never thought people cared about the file names, since I have
never heard a complaint 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 01:20:38PM -0400, Robert Haas wrote:
> So I think this proposed change is in the safe direction. If
> relfilenodes were currently preserved and we wanted to make them not
> be preserved, then I think you would be quite right to say "whoa,
> whoa, that could be a problem." Indeed it could. If anyone then in the
> future wanted to introduce a dependency on them staying the same, they
> would have a problem. However, nothing in the server itself can care
> about relfilenodes - or anything else - being *different* across a
> pg_upgrade. The whole point of pg_upgrade is to make it feel like you
> have the same database after you run it as you did before you ran it,
> even though under the hood a lot of surgery has been done. Barring
> bugs, you can never be sad about there being too LITTLE difference
> between the post-upgrade database and the pre-upgrade database.

Yes, this makes sense, and it is good we have stated the possible
benefits now:

*  pgBackRest
*  pg_upgrade diagnostics
*  TDE (maybe)

We can eventually evaluate the value of this based on those items.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> > Yes, we're talking about either incremental (or perhaps differential)
> > backup where only the files which are actually different would be backed
> > up.  Just like with PG, I can't provide any complete guarantees that
> > we'd be able to actually make this possible after a major version with
> > pgBackRest with this change, but it definitely isn't possible *without*
> > this change.  I can't see any reason why we wouldn't be able to do a
> > checksum-based incremental backup though (which would be *much* faster
> > than a regular backup) once this change is made and have that be a
> > reliable and trustworthy backup.  I'd want to think about it more and
> > discuss it with David in some detail before saying if we could maybe
> > perform a timestamp-based incremental backup (without checksum'ing the
> > files, as we do in normal situations), but that would really just be a
> > bonus.
> 
> Well, it would be nice to know exactly how it would help pgBackRest if
> that is one of the reasons we are adding this feature.

pgBackRest keeps a manifest for every file in the PG data directory that
is backed up and we identify that file by the filename.  Further, we
calculate a checksum for every file.  If the filenames didn't change
then we'd be able to compare the file in the new cluster against the
file and checksum in the manifest in order to be able to perform the
incremental/differential backup.  We don't store the inodes in the
manifest though, and we don't have any concept of looking at multiple
data directories at the same time or anything like that (which would
also mean that the old data directory would have to be kept around for
that to even work, which seems like a good bit of additional
complication and risk that someone might start up the old cluster by
accident..).

That's how it'd be very helpful to pgBackRest for the filenames to be
preserved across pg_upgrade's.

> > > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > > this code for that reason?
> > > > 
> > > > That this would help with TDE (of which there seems little doubt...) is
> > > > an additional benefit to this.  Specifically, taking the existing work
> > > > that's already been done to allow block-by-block encryption and
> > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > > number as the IV, just like many disk encryption systems do, avoids the
> > > > concerns that were brought up about using LSN for the IV with CTR and
> > > > it's certainly not difficult to do, but it does depend on this change.
> > > > This was all discussed previously and it sure looks like a sensible
> > > > approach to use that mirrors what many other systems already do
> > > > successfully.
> > > 
> > > Well, I would think we would not add this for TDE until we were sure
> > > someone was working on adding TDE.
> > 
> > That this would help with TDE is what I'd consider an added bonus.
> 
> Not if we have no plans to implement TDE, which was my point.  Why not
> wait to see if we are actually going to implement TDE rather than adding
> it now.  It is just so obvious, why do I have to state this?

There's been multiple years of effort put into implementing TDE and I'm
sure hopeful that it continues as I'm trying to put effort into moving
it forward myself.  I'm a bit baffled by the idea that we're just
suddenly going to stop putting effort into TDE as it is brought up time
and time again by clients that I've talked to as one of the few reasons
they haven't moved to PG yet- I can't believe that hasn't been
experienced by folks at other organizations too, I mean, there's people
maintaining forks of PG specifically for TDE ...

Seems like maybe we were both seeing something as obvious to the other
that wasn't actually the case.

> > > > > > make this required, general improved sanity when working with 
> > > > > > pg_upgrade
> > > > > > is frankly a benefit in its own right too...).  If the additional 
> > > > > > code
> > > > > 
> > > > > How?  I am not aware of any advantage except cosmetic.
> > > > 
> > > > Having to resort to matching up inode numbers between the two clusters
> > > > after a pg_upgrade to figure out what files are actually the same
> > > > underneath is a pain that goes beyond just cosmetics imv.  Removing that
> > > > additional level that admins, and developers for that matter, have to go
> > > > through would be a nice improvement on its own.
> > > 
> > > OK, I was just not aware anyone did that, since I have never hard anyone
> > > complain about it before.
> > 
> > I've certainly done it and I'd be kind of surprised if others haven't,
> > but I've also played a lot with pg_dump in various modes, so perhaps
> > that's not a great representation.  I've definitely had to explain to
> > clients why there's a whole different set of filenames after a
> > pg_upgrade 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 12:51 PM Bruce Momjian  wrote:
> I just don't want to add requirements/complexity to pg_upgrade without
> clearly stated reasons because future database changes will need to
> honor this new preservation behavior.

Well, I agree that it's good to have reasons clearly stated and I hope
that at this point you agree that they have been. Whether you agree
with them is another question, but I hope you at least agree that they
have been stated.

As far as the other part of your concern, what I think makes this
change pretty safe is that we are preserving more things rather than
fewer. I can imagine some server behavior depending on something being
the same between the old and the new clusters, but it is harder to
imagine a dependency on something not being preserved. For example, we
know that the OIDs of pg_type rows have to be the same in the old and
new cluster because arrays are stored on disk with the type OIDs
included. Therefore those need to be preserved. If in the future we
changed things so that arrays - and other container types - did not
include the type OIDs in the on-disk representation, then perhaps it
would no longer be necessary to preserve the OIDs of pg_type rows
across a pg_upgrade. However, it would not be harmful to do so. It
just might not be required.

So I think this proposed change is in the safe direction. If
relfilenodes were currently preserved and we wanted to make them not
be preserved, then I think you would be quite right to say "whoa,
whoa, that could be a problem." Indeed it could. If anyone then in the
future wanted to introduce a dependency on them staying the same, they
would have a problem. However, nothing in the server itself can care
about relfilenodes - or anything else - being *different* across a
pg_upgrade. The whole point of pg_upgrade is to make it feel like you
have the same database after you run it as you did before you ran it,
even though under the hood a lot of surgery has been done. Barring
bugs, you can never be sad about there being too LITTLE difference
between the post-upgrade database and the pre-upgrade database.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> Yes, we're talking about either incremental (or perhaps differential)
> backup where only the files which are actually different would be backed
> up.  Just like with PG, I can't provide any complete guarantees that
> we'd be able to actually make this possible after a major version with
> pgBackRest with this change, but it definitely isn't possible *without*
> this change.  I can't see any reason why we wouldn't be able to do a
> checksum-based incremental backup though (which would be *much* faster
> than a regular backup) once this change is made and have that be a
> reliable and trustworthy backup.  I'd want to think about it more and
> discuss it with David in some detail before saying if we could maybe
> perform a timestamp-based incremental backup (without checksum'ing the
> files, as we do in normal situations), but that would really just be a
> bonus.

Well, it would be nice to know exactly how it would help pgBackRest if
that is one of the reasons we are adding this feature.

> > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > this code for that reason?
> > > 
> > > That this would help with TDE (of which there seems little doubt...) is
> > > an additional benefit to this.  Specifically, taking the existing work
> > > that's already been done to allow block-by-block encryption and
> > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > number as the IV, just like many disk encryption systems do, avoids the
> > > concerns that were brought up about using LSN for the IV with CTR and
> > > it's certainly not difficult to do, but it does depend on this change.
> > > This was all discussed previously and it sure looks like a sensible
> > > approach to use that mirrors what many other systems already do
> > > successfully.
> > 
> > Well, I would think we would not add this for TDE until we were sure
> > someone was working on adding TDE.
> 
> That this would help with TDE is what I'd consider an added bonus.

Not if we have no plans to implement TDE, which was my point.  Why not
wait to see if we are actually going to implement TDE rather than adding
it now.  It is just so obvious, why do I have to state this?

> > > > > make this required, general improved sanity when working with 
> > > > > pg_upgrade
> > > > > is frankly a benefit in its own right too...).  If the additional code
> > > > 
> > > > How?  I am not aware of any advantage except cosmetic.
> > > 
> > > Having to resort to matching up inode numbers between the two clusters
> > > after a pg_upgrade to figure out what files are actually the same
> > > underneath is a pain that goes beyond just cosmetics imv.  Removing that
> > > additional level that admins, and developers for that matter, have to go
> > > through would be a nice improvement on its own.
> > 
> > OK, I was just not aware anyone did that, since I have never hard anyone
> > complain about it before.
> 
> I've certainly done it and I'd be kind of surprised if others haven't,
> but I've also played a lot with pg_dump in various modes, so perhaps
> that's not a great representation.  I've definitely had to explain to
> clients why there's a whole different set of filenames after a
> pg_upgrade and why that is the case for an 'in place' upgrade before
> too.

Uh, so I guess I am right that few people have mentioned this in the
past.  Why were users caring about the file names?

> > > > > was a huge burden or even a moderate one then that might be an 
> > > > > argument
> > > > > against, but it hardly sounds like it will be given Robert's thorough
> > > > > analysis so far and the (admittedly not complete, but not that far 
> > > > > from
> > > > > it based on the DB OID review) proposed patch.
> > > > 
> > > > I am find to add it if it is minor, but I want to see the calculus of
> > > > its value vs complexity, which I have not seen spelled out.
> > > 
> > > I feel that this, along with the prior discussions, spells it out
> > > sufficiently given the patch's complexity looks to be reasonably minor
> > > and very similar to the existing things that pg_upgrade already does.
> > > Had pg_upgrade done this in the first place, I don't think there would
> > > have been nearly this amount of discussion about it.
> > 
> > Well, there is a reason pg_upgrade didn't initially do this --- because
> > it adds complexity, and potentially makes future changes to pg_upgrade
> > necessary if the server behavior changes.
> 
> I have a very hard time seeing what changes might happen in the server
> in this space that wouldn't have an impact on pg_upgrade, with or
> without this.

I don't know, but I have to ask since I can't know the future, so any
"preseration" has to be studied.

> > I am not saying this change is wrong, but I think the reasons need to be
> > stated in this thread, rather than just moving forward.
> 
> Ok, they've been stated and it seems to at least Robert 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 26, 2021 at 12:34:56PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > > > > Anyone see a flaw in that analysis?
> > > > > 
> > > > > I am still waiting to hear the purpose of this preservation.  As long 
> > > > > as
> > > > > you don't apply the patch, I guess I will just stop asking.
> > > > 
> > > > I'm a bit confused why this question keeps coming up as we've discussed
> > > > multiple reasons (incremental backups, possible use for TDE which would
> > > 
> > > I have not seen much explaination on pgBackRest, except me mentioning
> > > it.  Is this something really useful?
> > 
> > Being able to quickly perform a backup on a newly upgraded cluster would
> > certainly be valuable and that's definitely not possible today due to
> > all of the filenames changing.
> 
> You mean incremental backup, right?  I was told this by the pgBackRest
> developers during PGCon, but I have not heard that stated publicly, so I
> hate to go just on what I heard rather than seeing that stated publicly.

Yes, we're talking about either incremental (or perhaps differential)
backup where only the files which are actually different would be backed
up.  Just like with PG, I can't provide any complete guarantees that
we'd be able to actually make this possible after a major version with
pgBackRest with this change, but it definitely isn't possible *without*
this change.  I can't see any reason why we wouldn't be able to do a
checksum-based incremental backup though (which would be *much* faster
than a regular backup) once this change is made and have that be a
reliable and trustworthy backup.  I'd want to think about it more and
discuss it with David in some detail before saying if we could maybe
perform a timestamp-based incremental backup (without checksum'ing the
files, as we do in normal situations), but that would really just be a
bonus.

> > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > this code for that reason?
> > 
> > That this would help with TDE (of which there seems little doubt...) is
> > an additional benefit to this.  Specifically, taking the existing work
> > that's already been done to allow block-by-block encryption and
> > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > number as the IV, just like many disk encryption systems do, avoids the
> > concerns that were brought up about using LSN for the IV with CTR and
> > it's certainly not difficult to do, but it does depend on this change.
> > This was all discussed previously and it sure looks like a sensible
> > approach to use that mirrors what many other systems already do
> > successfully.
> 
> Well, I would think we would not add this for TDE until we were sure
> someone was working on adding TDE.

That this would help with TDE is what I'd consider an added bonus.

> > > > make this required, general improved sanity when working with pg_upgrade
> > > > is frankly a benefit in its own right too...).  If the additional code
> > > 
> > > How?  I am not aware of any advantage except cosmetic.
> > 
> > Having to resort to matching up inode numbers between the two clusters
> > after a pg_upgrade to figure out what files are actually the same
> > underneath is a pain that goes beyond just cosmetics imv.  Removing that
> > additional level that admins, and developers for that matter, have to go
> > through would be a nice improvement on its own.
> 
> OK, I was just not aware anyone did that, since I have never hard anyone
> complain about it before.

I've certainly done it and I'd be kind of surprised if others haven't,
but I've also played a lot with pg_dump in various modes, so perhaps
that's not a great representation.  I've definitely had to explain to
clients why there's a whole different set of filenames after a
pg_upgrade and why that is the case for an 'in place' upgrade before
too.

> > > > was a huge burden or even a moderate one then that might be an argument
> > > > against, but it hardly sounds like it will be given Robert's thorough
> > > > analysis so far and the (admittedly not complete, but not that far from
> > > > it based on the DB OID review) proposed patch.
> > > 
> > > I am find to add it if it is minor, but I want to see the calculus of
> > > its value vs complexity, which I have not seen spelled out.
> > 
> > I feel that this, along with the prior discussions, spells it out
> > sufficiently given the patch's complexity looks to be reasonably minor
> > and very similar to the existing things that pg_upgrade already does.
> > Had pg_upgrade done this in the first place, I don't think there would
> > have been nearly this amount of discussion about it.
> 
> Well, there is a reason pg_upgrade didn't 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 12:37:19PM -0400, Robert Haas wrote:
> On Thu, Aug 26, 2021 at 11:48 AM Bruce Momjian  wrote:
> > I am find to add it if it is minor, but I want to see the calculus of
> > its value vs complexity, which I have not seen spelled out.
> 
> I don't think it's going to be all that complicated, but we're going
> to have to wait until we have something closer to a final patch before
> we can really evaluate that. I am honestly a little puzzled about why
> you think complexity is such a big issue for this patch in particular.
> I feel we do probably several hundred things every release cycle that
> are more complicated than this, so it doesn't seem like this is
> particularly extraordinary or needs a lot of extra scrutiny. I do
> think there is some risk that there are messy cases we can't handle
> cleanly, but if that becomes an issue then I'll abandon the effort
> until a solution can be found. I'm not trying to relentlessly drive
> something through that is a bad idea on principle.
> 
> I agree with all Stephen's comments, too.

I just don't want to add requirements/complexity to pg_upgrade without
clearly stated reasons because future database changes will need to
honor this new preservation behavior.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 12:34:56PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote:
> > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > > > Anyone see a flaw in that analysis?
> > > > 
> > > > I am still waiting to hear the purpose of this preservation.  As long as
> > > > you don't apply the patch, I guess I will just stop asking.
> > > 
> > > I'm a bit confused why this question keeps coming up as we've discussed
> > > multiple reasons (incremental backups, possible use for TDE which would
> > 
> > I have not seen much explaination on pgBackRest, except me mentioning
> > it.  Is this something really useful?
> 
> Being able to quickly perform a backup on a newly upgraded cluster would
> certainly be valuable and that's definitely not possible today due to
> all of the filenames changing.

You mean incremental backup, right?  I was told this by the pgBackRest
developers during PGCon, but I have not heard that stated publicly, so I
hate to go just on what I heard rather than seeing that stated publicly.

> > As far as TDE, I haven't seen any concrete plan for that, so why add
> > this code for that reason?
> 
> That this would help with TDE (of which there seems little doubt...) is
> an additional benefit to this.  Specifically, taking the existing work
> that's already been done to allow block-by-block encryption and
> adjusting it for AES-XTS and then using the db-dir+relfileno+block
> number as the IV, just like many disk encryption systems do, avoids the
> concerns that were brought up about using LSN for the IV with CTR and
> it's certainly not difficult to do, but it does depend on this change.
> This was all discussed previously and it sure looks like a sensible
> approach to use that mirrors what many other systems already do
> successfully.

Well, I would think we would not add this for TDE until we were sure
someone was working on adding TDE.

> > > make this required, general improved sanity when working with pg_upgrade
> > > is frankly a benefit in its own right too...).  If the additional code
> > 
> > How?  I am not aware of any advantage except cosmetic.
> 
> Having to resort to matching up inode numbers between the two clusters
> after a pg_upgrade to figure out what files are actually the same
> underneath is a pain that goes beyond just cosmetics imv.  Removing that
> additional level that admins, and developers for that matter, have to go
> through would be a nice improvement on its own.

OK, I was just not aware anyone did that, since I have never hard anyone
complain about it before.
 
> > > was a huge burden or even a moderate one then that might be an argument
> > > against, but it hardly sounds like it will be given Robert's thorough
> > > analysis so far and the (admittedly not complete, but not that far from
> > > it based on the DB OID review) proposed patch.
> > 
> > I am find to add it if it is minor, but I want to see the calculus of
> > its value vs complexity, which I have not seen spelled out.
> 
> I feel that this, along with the prior discussions, spells it out
> sufficiently given the patch's complexity looks to be reasonably minor
> and very similar to the existing things that pg_upgrade already does.
> Had pg_upgrade done this in the first place, I don't think there would
> have been nearly this amount of discussion about it.

Well, there is a reason pg_upgrade didn't initially do this --- because
it adds complexity, and potentially makes future changes to pg_upgrade
necessary if the server behavior changes.

I am not saying this change is wrong, but I think the reasons need to be
stated in this thread, rather than just moving forward.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Aug 26, 2021 at 11:39 AM Stephen Frost  wrote:
> > This looks like a pretty good analysis to me.  As it relates to the
> > question about allowing users to specify an OID, I'd be inclined to
> > allow it but only for OIDs >64k.  We've certainly reserved things in the
> > past and I don't see any issue with having that reservation here, but if
> > we're going to build the capability to specify the OID into CREATE
> > DATABASE then it seems a bit odd to disallow users from using it, as
> > long as we're preventing them from causing problems with it.
> >
> > Are there issues that you see with allowing users to specify the OID
> > even with the >64k restriction..?  I can't think of one offhand but
> > perhaps I'm missing something.
> 
> So I actually should have said 16k here, not 64k, as somebody already
> pointed out to me off-list. Whee!

Hah, yes, of course.

> I don't know of a reason not to let people do that, other than that it
> seems like an attractive nuisance. People will do it and it will fail
> because they chose a duplicate OID, or they'll complain that a regular
> dump and restore didn't preserve their database OIDs, or maybe they'll
> expect that they can copy a database from one cluster to another
> because they gave it the same OID! That said, I don't see a great harm
> in it. It just seems to me like exposing knobs to users that don't
> seem to have any legitimate use may be borrowing trouble.

We're going to have to gate this somehow to allow the OIDs under 16k to
be used, so it seems like what you're suggesting is that we have that
gate in place but then allow any OID to be used if you've crossed that
gate?

That is, if we do something like:

SELECT pg_catalog.binary_upgrade_allow_setting_db_oid();
CREATE DATABASE blah WITH OID 1234;

for pg_upgrade, well, users who are interested may well figure out how
to do that themselves if they decide they want to set the OID, whereas
if it 'just works' provided they don't try to use an OID too low then
maybe they won't try to bypass the restriction against using system
OIDs..?

Ok, I'll give you that this is a stretch and I'm on the fence about if
it's worthwhile or not to include and document and if, as you say, it's
inviting trouble to allow users to set it.  Users do seem to have a
knack for finding things even when they aren't documented and then we
get to deal with those complaints too. :)

Perhaps others have some stronger feelings one way or another.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 11:48 AM Bruce Momjian  wrote:
> I am find to add it if it is minor, but I want to see the calculus of
> its value vs complexity, which I have not seen spelled out.

I don't think it's going to be all that complicated, but we're going
to have to wait until we have something closer to a final patch before
we can really evaluate that. I am honestly a little puzzled about why
you think complexity is such a big issue for this patch in particular.
I feel we do probably several hundred things every release cycle that
are more complicated than this, so it doesn't seem like this is
particularly extraordinary or needs a lot of extra scrutiny. I do
think there is some risk that there are messy cases we can't handle
cleanly, but if that becomes an issue then I'll abandon the effort
until a solution can be found. I'm not trying to relentlessly drive
something through that is a bad idea on principle.

I agree with all Stephen's comments, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > > Anyone see a flaw in that analysis?
> > > 
> > > I am still waiting to hear the purpose of this preservation.  As long as
> > > you don't apply the patch, I guess I will just stop asking.
> > 
> > I'm a bit confused why this question keeps coming up as we've discussed
> > multiple reasons (incremental backups, possible use for TDE which would
> 
> I have not seen much explaination on pgBackRest, except me mentioning
> it.  Is this something really useful?

Being able to quickly perform a backup on a newly upgraded cluster would
certainly be valuable and that's definitely not possible today due to
all of the filenames changing.

> As far as TDE, I haven't seen any concrete plan for that, so why add
> this code for that reason?

That this would help with TDE (of which there seems little doubt...) is
an additional benefit to this.  Specifically, taking the existing work
that's already been done to allow block-by-block encryption and
adjusting it for AES-XTS and then using the db-dir+relfileno+block
number as the IV, just like many disk encryption systems do, avoids the
concerns that were brought up about using LSN for the IV with CTR and
it's certainly not difficult to do, but it does depend on this change.
This was all discussed previously and it sure looks like a sensible
approach to use that mirrors what many other systems already do
successfully.

> > make this required, general improved sanity when working with pg_upgrade
> > is frankly a benefit in its own right too...).  If the additional code
> 
> How?  I am not aware of any advantage except cosmetic.

Having to resort to matching up inode numbers between the two clusters
after a pg_upgrade to figure out what files are actually the same
underneath is a pain that goes beyond just cosmetics imv.  Removing that
additional level that admins, and developers for that matter, have to go
through would be a nice improvement on its own.

> > was a huge burden or even a moderate one then that might be an argument
> > against, but it hardly sounds like it will be given Robert's thorough
> > analysis so far and the (admittedly not complete, but not that far from
> > it based on the DB OID review) proposed patch.
> 
> I am find to add it if it is minor, but I want to see the calculus of
> its value vs complexity, which I have not seen spelled out.

I feel that this, along with the prior discussions, spells it out
sufficiently given the patch's complexity looks to be reasonably minor
and very similar to the existing things that pg_upgrade already does.
Had pg_upgrade done this in the first place, I don't think there would
have been nearly this amount of discussion about it.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PoC] Federated Authn/z with OAUTHBEARER

2021-08-26 Thread Zhihong Yu
On Thu, Aug 26, 2021 at 9:13 AM Jacob Champion  wrote:

> On Wed, 2021-08-25 at 15:25 -0700, Zhihong Yu wrote:
> >
> > Hi,
> > For v2-0001-common-jsonapi-support-FRONTEND-clients.patch :
> >
> > +   /* Clean up. */
> > +   termJsonLexContext();
> >
> > At the end of termJsonLexContext(), empty is copied to lex. For stack
> > based JsonLexContext, the copy seems unnecessary.
> > Maybe introduce a boolean parameter for termJsonLexContext() to
> > signal that the copy can be omitted ?
>
> Do you mean heap-based? i.e. destroyJsonLexContext() does an
> unnecessary copy before free? Yeah, in that case it's not super useful,
> but I think I'd want some evidence that the performance hit matters
> before optimizing it.
>
> Are there any other internal APIs that take a boolean parameter like
> that? If not, I think we'd probably just want to remove the copy
> entirely if it's a problem.
>
> > +#ifdef FRONTEND
> > +   /* make sure initialization succeeded */
> > +   if (lex->strval == NULL)
> > +   return JSON_OUT_OF_MEMORY;
> >
> > Should PQExpBufferBroken(lex->strval) be used for the check ?
>
> It should be okay to continue if the strval is broken but non-NULL,
> since it's about to be reset. That has the fringe benefit of allowing
> the function to go as far as possible without failing, though that's
> probably a pretty weak justification.
>
> In practice, do you think that the probability of success is low enough
> that we should just short-circuit and be done with it?
>
> On Wed, 2021-08-25 at 16:24 -0700, Zhihong Yu wrote:
> >
> > For v2-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch :
> >
> > +   i_init_session();
> > +
> > +   if (!conn->oauth_client_id)
> > +   {
> > +   /* We can't talk to a server without a client identifier. */
> > +   appendPQExpBufferStr(>errorMessage,
> > +libpq_gettext("no oauth_client_id is set
> for the connection"));
> > +   goto cleanup;
> >
> > Can conn->oauth_client_id check be performed ahead
> > of i_init_session() ? That way, ```goto cleanup``` can be replaced
> > with return.
>
> Yeah, I think that makes sense. FYI, this is probably one of the
> functions that will be rewritten completely once iddawc is removed.
>
> > +   if (!error_code || (strcmp(error_code, "authorization_pending")
> > +   && strcmp(error_code, "slow_down")))
> >
> > What if, in the future, there is error code different from the above
> > two which doesn't represent "OAuth token retrieval failed" scenario ?
>
> We'd have to update our code; that would be a breaking change to the
> Device Authorization spec. Here's what it says today [1]:
>
>The "authorization_pending" and "slow_down" error codes define
>particularly unique behavior, as they indicate that the OAuth client
>should continue to poll the token endpoint by repeating the token
>request (implementing the precise behavior defined above).  If the
>client receives an error response with any other error code, it MUST
>stop polling and SHOULD react accordingly, for example, by displaying
>an error to the user.
>
> > For client_initial_response(),
> >
> > +   token_buf = createPQExpBuffer();
> > +   if (!token_buf)
> > +   goto cleanup;
> >
> > If token_buf is NULL, there doesn't seem to be anything to free. We
> > can return directly.
>
> That's true today, but implementations have a habit of changing. I
> personally prefer not to introduce too many exit points from a function
> that's already using goto. In my experience, that makes future
> maintenance harder.
>
> Thanks for the reviews! Have you been able to give the patchset a try
> with an OAuth deployment?
>
> --Jacob
>
> [1] https://datatracker.ietf.org/doc/html/rfc8628#section-3.5

Hi,
bq. destroyJsonLexContext() does an unnecessary copy before free? Yeah, in
that case it's not super useful,
but I think I'd want some evidence that the performance hit matters before
optimizing it.

Yes I agree.

bq. In practice, do you think that the probability of success is low enough
that we should just short-circuit and be done with it?

Haven't had a chance to try your patches out yet.
I will leave this to people who are more familiar with OAuth
implementation(s).

bq.  I personally prefer not to introduce too many exit points from a
function that's already using goto.

Fair enough.

Cheers


Re: [PoC] Federated Authn/z with OAUTHBEARER

2021-08-26 Thread Jacob Champion
On Wed, 2021-08-25 at 15:25 -0700, Zhihong Yu wrote:
> 
> Hi,
> For v2-0001-common-jsonapi-support-FRONTEND-clients.patch :
> 
> +   /* Clean up. */
> +   termJsonLexContext(); 
> 
> At the end of termJsonLexContext(), empty is copied to lex. For stack
> based JsonLexContext, the copy seems unnecessary.
> Maybe introduce a boolean parameter for termJsonLexContext() to
> signal that the copy can be omitted ?

Do you mean heap-based? i.e. destroyJsonLexContext() does an
unnecessary copy before free? Yeah, in that case it's not super useful,
but I think I'd want some evidence that the performance hit matters
before optimizing it.

Are there any other internal APIs that take a boolean parameter like
that? If not, I think we'd probably just want to remove the copy
entirely if it's a problem.

> +#ifdef FRONTEND
> +   /* make sure initialization succeeded */
> +   if (lex->strval == NULL)
> +   return JSON_OUT_OF_MEMORY;
> 
> Should PQExpBufferBroken(lex->strval) be used for the check ?

It should be okay to continue if the strval is broken but non-NULL,
since it's about to be reset. That has the fringe benefit of allowing
the function to go as far as possible without failing, though that's
probably a pretty weak justification.

In practice, do you think that the probability of success is low enough
that we should just short-circuit and be done with it?

On Wed, 2021-08-25 at 16:24 -0700, Zhihong Yu wrote:
> 
> For v2-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch :
> 
> +   i_init_session();
> +
> +   if (!conn->oauth_client_id)
> +   {
> +   /* We can't talk to a server without a client identifier. */
> +   appendPQExpBufferStr(>errorMessage,
> +libpq_gettext("no oauth_client_id is set for the 
> connection"));
> +   goto cleanup;
> 
> Can conn->oauth_client_id check be performed ahead
> of i_init_session() ? That way, ```goto cleanup``` can be replaced
> with return.

Yeah, I think that makes sense. FYI, this is probably one of the
functions that will be rewritten completely once iddawc is removed.

> +   if (!error_code || (strcmp(error_code, "authorization_pending")
> +   && strcmp(error_code, "slow_down")))
> 
> What if, in the future, there is error code different from the above
> two which doesn't represent "OAuth token retrieval failed" scenario ?

We'd have to update our code; that would be a breaking change to the
Device Authorization spec. Here's what it says today [1]:

   The "authorization_pending" and "slow_down" error codes define
   particularly unique behavior, as they indicate that the OAuth client
   should continue to poll the token endpoint by repeating the token
   request (implementing the precise behavior defined above).  If the
   client receives an error response with any other error code, it MUST
   stop polling and SHOULD react accordingly, for example, by displaying
   an error to the user.

> For client_initial_response(),
> 
> +   token_buf = createPQExpBuffer();
> +   if (!token_buf)
> +   goto cleanup;
> 
> If token_buf is NULL, there doesn't seem to be anything to free. We
> can return directly.

That's true today, but implementations have a habit of changing. I
personally prefer not to introduce too many exit points from a function
that's already using goto. In my experience, that makes future
maintenance harder.

Thanks for the reviews! Have you been able to give the patchset a try
with an OAuth deployment?

--Jacob

[1] https://datatracker.ietf.org/doc/html/rfc8628#section-3.5


Re: speed up verifying UTF-8

2021-08-26 Thread Vladimir Sitnikov
>Attached is v23 incorporating the 32-bit transition table, with the
necessary comment adjustments

32bit table is nice.


Would you please replace
https://github.com/BobSteagall/utf_utils/blob/master/src/utf_utils.cpp URL
with
https://github.com/BobSteagall/utf_utils/blob/6b7a465265de2f5fa6133d653df0c9bdd73bbcf8/src/utf_utils.cpp
in the header of src/port/pg_utf8_fallback.c?

It would make the URL more stable in case the file gets renamed.

Vladimir


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 11:39 AM Stephen Frost  wrote:
> This looks like a pretty good analysis to me.  As it relates to the
> question about allowing users to specify an OID, I'd be inclined to
> allow it but only for OIDs >64k.  We've certainly reserved things in the
> past and I don't see any issue with having that reservation here, but if
> we're going to build the capability to specify the OID into CREATE
> DATABASE then it seems a bit odd to disallow users from using it, as
> long as we're preventing them from causing problems with it.
>
> Are there issues that you see with allowing users to specify the OID
> even with the >64k restriction..?  I can't think of one offhand but
> perhaps I'm missing something.

So I actually should have said 16k here, not 64k, as somebody already
pointed out to me off-list. Whee!

I don't know of a reason not to let people do that, other than that it
seems like an attractive nuisance. People will do it and it will fail
because they chose a duplicate OID, or they'll complain that a regular
dump and restore didn't preserve their database OIDs, or maybe they'll
expect that they can copy a database from one cluster to another
because they gave it the same OID! That said, I don't see a great harm
in it. It just seems to me like exposing knobs to users that don't
seem to have any legitimate use may be borrowing trouble.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > Anyone see a flaw in that analysis?
> > 
> > I am still waiting to hear the purpose of this preservation.  As long as
> > you don't apply the patch, I guess I will just stop asking.
> 
> I'm a bit confused why this question keeps coming up as we've discussed
> multiple reasons (incremental backups, possible use for TDE which would

I have not seen much explaination on pgBackRest, except me mentioning
it.  Is this something really useful?

As far as TDE, I haven't seen any concrete plan for that, so why add
this code for that reason?

> make this required, general improved sanity when working with pg_upgrade
> is frankly a benefit in its own right too...).  If the additional code

How?  I am not aware of any advantage except cosmetic.

> was a huge burden or even a moderate one then that might be an argument
> against, but it hardly sounds like it will be given Robert's thorough
> analysis so far and the (admittedly not complete, but not that far from
> it based on the DB OID review) proposed patch.

I am find to add it if it is minor, but I want to see the calculus of
its value vs complexity, which I have not seen spelled out.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: badly calculated width of emoji in psql

2021-08-26 Thread Pavel Stehule
čt 26. 8. 2021 v 17:25 odesílatel Jacob Champion 
napsal:

> On Wed, 2021-08-25 at 16:15 -0400, John Naylor wrote:
> > On Tue, Aug 24, 2021 at 1:50 PM Jacob Champion 
> wrote:
> > >
> > > Does there need to be any sanity check for overlapping ranges between
> > > the combining and fullwidth sets? The Unicode data on a dev's machine
> > > would have to be broken somehow for that to happen, but it could
> > > potentially go undetected for a while if it did.
> >
> > It turns out I should have done that to begin with. In the Unicode
> > data, it apparently happens that a character can be both combining
> > and wide, and that will cause ranges to overlap in my scheme:
>
> I was looking for overlaps in my review, but I skipped right over that,
> sorry...
>
> On Thu, 2021-08-26 at 11:12 -0400, John Naylor wrote:
> > I pushed Jacob's patch with the addendum I shared upthread, plus a
> > comment about search order. I also took the liberty of changing the
> > author in the CF app to Jacob. Later I'll push detecting non-spacing
> > characters beyond the BMP.
>
> Thanks!
>

Great, thanks

Pavel


> --Jacob
>


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 11:35:01AM -0400, Robert Haas wrote:
> On Thu, Aug 26, 2021 at 11:24 AM Bruce Momjian  wrote:
> > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > Anyone see a flaw in that analysis?
> >
> > I am still waiting to hear the purpose of this preservation.  As long as
> > you don't apply the patch, I guess I will just stop asking.
> 
> You make it sound like I didn't answer that question the last time you
> asked it, but I did.[1] I went back to the previous thread and found
> that, in fact, there's at least one email *from you* appearing to
> endorse that concept for reasons unrelated to TDE[2] and another where
> you appear to agree that it would be useful for TDE to do it.[3]
> Stephen Frost also wrote up his discussion during the Unconference and
> some of his reasons for liking the idea.[4]
> 
> If you've changed your mind about this being a good idea, or if you no
> longer think it's useful without TDE, that's fine. Everyone is
> entitled to change their opinion. But then please say that straight
> out. It baffles me why you're now acting as if it hasn't been
> discussed when it clearly has been, and both you and I were
> participants in that discussion.
> 
> [1] 
> https://www.postgresql.org/message-id/ca+tgmob7msyh3vray87usr22uakvvsyy4zbaqw2ao2cfoud...@mail.gmail.com
> [2] https://www.postgresql.org/message-id/20210601140949.gc22...@momjian.us
> [3] https://www.postgresql.org/message-id/20210527210023.gj5...@momjian.us
> [4] 
> https://www.postgresql.org/message-id/20210531201652.gy20...@tamriel.snowman.net

Yes, it would help incremental backup of pgBackRest, as reported by the
developers.  However, I have seen no discussion if this is useful enough
reason to add the complexity to preserve this.  The TODO list shows
"Desirability" as the first item to be discussed, so I expected that to
be discussed first.  Also, with TDE not progressing (and my approach not
even needing this), I have not seen a full discussion if this item is
desirable based on its complexity.

What I did see is this patch appear with no context of why it is useful
given our current plans, except for pgBackRest, which I think I
mentioned.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Aug 17, 2021 at 2:50 PM Robert Haas  wrote:
> > > Less sure that this is a good idea, though.  In particular, I do not
> > > think that you can make it work in the face of
> > > alter database template1 rename to oops;
> > > create database template1;
> >
> > That is a really good point. If we can't categorically force the OID
> > of those databases to have a particular, fixed value, and based on
> > this example that seems to be impossible, then there's always a
> > possibility that we might find a value in the old cluster that doesn't
> > happen to match what is present in the new cluster. Seen from that
> > angle, the problem is really with databases that are pre-existent in
> > the new cluster but whose contents still need to be dumped. Maybe we
> > could (optionally? conditionally?) drop those databases from the new
> > cluster and then recreate them with the OID that we want them to have.
> 
> Actually, we do that already. create_new_objects() runs pg_restore
> with --create for most databases, but with --clean --create for
> template1 and postgres. This means that template1 and postgres will
> always be recreated in the new cluster, and other databases are
> assumed not to exist in the new cluster and the upgrade will fail if
> they unexpectedly do. And the reason why pg_upgrade does that is that
> it wants to "propagate [the] database-level properties" of postgres
> and template1. So suppose we just make the database OID one of the
> database-level properties that we want to propagate. That should
> mostly just work, but where can things go wrong?
> 
> The only real failure mode is we try to create a database in the new
> cluster and find out that the OID is already in use. If the new OID
> that collides >64k, then the user has messed with the new cluster
> before doing that. And since pg_upgrade is pretty clearly already
> assuming that you shouldn't do that, it's fine to also make that
> assumption in this case. We can disregard such cases as user error.
> 
> If the new OID that collides is <64k, then it must be colliding with
> template0, template1, or postgres in the new cluster, because those
> are the only databases that can have such OIDs since, currently, we
> don't allow users to specify an OID for a new database. And the
> problem cannot be with template1, because we hard-code its OID to 1.
> If there is a database with OID 1 in either cluster, it must be
> template1, and if there is a database with OID 1 in both clusters, it
> must be template1 in both cases, and we'll just drop and recreate it
> with OID 1 and everything is fine. So we need only consider template0
> and postgres, which are created with system-generated OIDs. And, it
> would be no issue if either of those databases had the same OID in the
> old and new cluster, so the only possible OID collision is one where
> the same system-generated OID was assigned to one of those databases
> in the old cluster and to the other in the new cluster.
> 
> First consider the case where template0 has OID, say, 13000, in the
> old cluster, and postgres has that OID in the new cluster. No problem
> occurs, because template0 isn't transferred anyway. The reverse
> direction is a problem, though. If postgres had been assigned OID
> 13000 in the old cluster and, by sheer chance, template0 had that OID
> in the new cluster, then the upgrade would fail, because it wouldn't
> be able to recreate the postgres database with the correct OID.
> 
> But that doesn't seem very difficult to fix. I think all we need to do
> is have initdb assign a fixed OID to template0 at creation time. Then,
> in any new release to which someone might be trying to upgrade, the
> system-generated OID assigned to postgres in the old release can't
> match the fixed OID assigned to template0 in the new release, so the
> one problem case is ruled out. We do need, however, to make sure that
> the assign-my-database-a-fixed-OID syntax is either entirely
> restricted to initdb & pg_upgrade or at least that OIDS < 64k can only
> be assigned in one of those modes. Otherwise, some creative person
> could manufacture new problem cases by setting up the source database
> so that the OID of one of their databases matches the fixed OID we
> gave to template0 or template1, or the system-generated OID for
> postgres in the new cluster.
> 
> In short, as far as I can see, all we need to do to preserve database
> OIDs across pg_upgrade is:
> 
> 1. Add a new syntax for creating a database with a given OID, and use
> it in pg_dump --binary-upgrade.
> 2. Don't let users use it at least for OIDs <64k, or maybe just don't
> let them use it at all.
> 3. But let initdb use it, and have initdb set the initial OID for
> template0 to a fixed value < 1. If the user changes it later, no
> problem; the cluster into which they are upgrading won't contain any
> databases with high-numbered OIDs.
> 
> Anyone see a 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > Anyone see a flaw in that analysis?
> 
> I am still waiting to hear the purpose of this preservation.  As long as
> you don't apply the patch, I guess I will just stop asking.

I'm a bit confused why this question keeps coming up as we've discussed
multiple reasons (incremental backups, possible use for TDE which would
make this required, general improved sanity when working with pg_upgrade
is frankly a benefit in its own right too...).  If the additional code
was a huge burden or even a moderate one then that might be an argument
against, but it hardly sounds like it will be given Robert's thorough
analysis so far and the (admittedly not complete, but not that far from
it based on the DB OID review) proposed patch.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: speed up verifying UTF-8

2021-08-26 Thread John Naylor
I wrote:

> Naively, the shift-based DFA requires 64-bit integers to encode the
transitions, but I recently came across an idea from Dougall Johnson of
using the Z3 SMT solver to pack the transitions into 32-bit integers [1].
That halves the size of the transition table for free. I adapted that
effort to the existing conventions in v22 and arrived at the attached
python script.
> [...]
> I'll include something like the attached text file diff in the next
patch. Some comments are now outdated, but this is good enough for
demonstration.

Attached is v23 incorporating the 32-bit transition table, with the
necessary comment adjustments.

--
John Naylor
EDB: http://www.enterprisedb.com
From d62f21bf7256d283c83c81b07e49fc33e270e4e3 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 28 Jul 2021 13:53:06 -0400
Subject: [PATCH v23] Add fast paths for validating UTF-8 text

Our previous validator is a traditional one that performs comparisons
and branching on one byte at a time. It's useful in that we always know
exactly how many bytes we have validated, but that precision comes at
a cost. Input validation can show up prominently in profiles of COPY
FROM, and future improvements to COPY FROM such as parallelism or
faster line parsing will put more pressure on input validation. Hence,
supplement with two fast paths, depending on platform:

On machines that support SSE4, use an algorithm described in the
paper "Validating UTF-8 In Less Than One Instruction Per Byte" by
John Keiser and Daniel Lemire. The authors have made available an
open source implementation within the simdjson library (Apache 2.0
license). The lookup tables and some naming conventions were adopted
from this library, but the code was written from scratch.

On other hardware, use a "shift-based" DFA.

Both implementations are highly optimized for blocks of ASCII text,
are relatively free of branches and thus robust against all kinds of
byte patterns, and delay error checking to the very end. With these
algorithms, UTF-8 validation is from anywhere from two to seven times
faster, depending on platform and the distribution of byte sequences
in the input.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
---
 config/c-compiler.m4 |  28 +-
 configure| 112 ++--
 configure.ac |  61 +++-
 src/Makefile.global.in   |   3 +
 src/common/wchar.c   |  35 +++
 src/include/mb/pg_wchar.h|   7 +
 src/include/pg_config.h.in   |   9 +
 src/include/port/pg_sse42_utils.h| 163 +++
 src/include/port/pg_utf8.h   | 103 +++
 src/port/Makefile|   6 +
 src/port/pg_utf8_fallback.c  | 231 +++
 src/port/pg_utf8_sse42.c | 349 +++
 src/port/pg_utf8_sse42_choose.c  |  68 +
 src/test/regress/expected/conversion.out | 170 +++
 src/test/regress/sql/conversion.sql  | 134 +
 src/tools/msvc/Mkvcbuild.pm  |   4 +
 src/tools/msvc/Solution.pm   |   3 +
 17 files changed, 1442 insertions(+), 44 deletions(-)
 create mode 100644 src/include/port/pg_sse42_utils.h
 create mode 100644 src/include/port/pg_utf8.h
 create mode 100644 src/port/pg_utf8_fallback.c
 create mode 100644 src/port/pg_utf8_sse42.c
 create mode 100644 src/port/pg_utf8_sse42_choose.c

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 780e906ecc..49d592a53c 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -591,36 +591,46 @@ if test x"$pgac_cv_gcc_atomic_int64_cas" = x"yes"; then
   AC_DEFINE(HAVE_GCC__ATOMIC_INT64_CAS, 1, [Define to 1 if you have __atomic_compare_exchange_n(int64 *, int64 *, int64).])
 fi])# PGAC_HAVE_GCC__ATOMIC_INT64_CAS
 
-# PGAC_SSE42_CRC32_INTRINSICS
+# PGAC_SSE42_INTRINSICS
 # ---
 # Check if the compiler supports the x86 CRC instructions added in SSE 4.2,
 # using the _mm_crc32_u8 and _mm_crc32_u32 intrinsic functions. (We don't
 # test the 8-byte variant, _mm_crc32_u64, but it is assumed to be present if
 # the other ones are, on x86-64 platforms)
 #
+# Also, check for support of x86 instructions added in SSSE3 and SSE4.1,
+# in particular _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128.
+# We might be able to assume these are understood by the compiler if CRC
+# intrinsics are, but it's better to document our reliance on them here.
+#
+# We don't test for SSE2 intrinsics, as they are assumed to be present if
+# SSE 4.2 intrinsics are.
+#
 # An optional compiler flag can be passed as argument (e.g. 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 11:24 AM Bruce Momjian  wrote:
> On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > Anyone see a flaw in that analysis?
>
> I am still waiting to hear the purpose of this preservation.  As long as
> you don't apply the patch, I guess I will just stop asking.

You make it sound like I didn't answer that question the last time you
asked it, but I did.[1] I went back to the previous thread and found
that, in fact, there's at least one email *from you* appearing to
endorse that concept for reasons unrelated to TDE[2] and another where
you appear to agree that it would be useful for TDE to do it.[3]
Stephen Frost also wrote up his discussion during the Unconference and
some of his reasons for liking the idea.[4]

If you've changed your mind about this being a good idea, or if you no
longer think it's useful without TDE, that's fine. Everyone is
entitled to change their opinion. But then please say that straight
out. It baffles me why you're now acting as if it hasn't been
discussed when it clearly has been, and both you and I were
participants in that discussion.

[1] 
https://www.postgresql.org/message-id/ca+tgmob7msyh3vray87usr22uakvvsyy4zbaqw2ao2cfoud...@mail.gmail.com
[2] https://www.postgresql.org/message-id/20210601140949.gc22...@momjian.us
[3] https://www.postgresql.org/message-id/20210527210023.gj5...@momjian.us
[4] 
https://www.postgresql.org/message-id/20210531201652.gy20...@tamriel.snowman.net

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Logical replication - schema change not invalidating the relation cache

2021-08-26 Thread vignesh C
On Fri, Jul 16, 2021 at 10:51 PM vignesh C  wrote:
>
> On Sat, Jul 3, 2021 at 11:23 AM Dilip Kumar  wrote:
> >
> > On Fri, Jul 2, 2021 at 12:03 PM Dilip Kumar  wrote:
> > >
> > > Yeah, this looks like a bug.  I will look at the patch.
> > >
> >
> > While looking into this, I think the main cause of the problem is that
> > schema rename does not invalidate the relation cache right?  I also
> > tried other cases e.g. if there is an open cursor and we rename the
> > schema
> >
> > CREATE SCHEMA sch1;
> > CREATE TABLE sch1.t1(c1 int);
> > insert into sch1.t1 values(1);
> > insert into sch1.t1 values(2);
> > insert into sch1.t1 values(3);
> > BEGIN;
> > DECLARE mycur CURSOR FOR SELECT * FROM sch1.t1;
> > FETCH NEXT FROM mycur ;
> > --At this point rename sch1 to sch2 from another session--
> > FETCH NEXT FROM mycur ;
> > UPDATE sch2.t1 SET c1 = 20 WHERE CURRENT OF mycur;
> > select * from sch2.t1 ;
> >
> > So even after the schema rename the cursor is able to fetch and its
> > also able to update on the same table in the new schema, ideally using
> > CURRENT OF CUR, you can update the same table for which you have
> > declared the cursor.  I am giving this example because this behavior
> > also looks somewhat similar.
>
> It works in this case because it uses the relation id for performing
> the next fetch and the relation id does not get changed after renaming
> the schema. Also since it holds a lock on the relation, alter/drop
> operations will not be allowed. I felt this behavior might be ok.  But
> the original scenario reported is an issue because it replicates the
> data of both the original table and the renamed schema's table.

The previous patch was failing because of the recent test changes made
by commit 201a76183e2 which unified new and get_new_node, attached
patch has the changes to handle the changes accordingly.

Regards,
Vignesh
From e86247f9502727f2c2e5f41489f8bbd4f69b24e4 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Thu, 26 Aug 2021 19:55:35 +0530
Subject: [PATCH v1] Fix for invalidating logical replication relations when
 there is a change in schema.

When the schema gets changed, the rel sync cache invalidation was not
happening, fixed it by adding a callback for schema change.
---
 src/backend/replication/pgoutput/pgoutput.c | 51 ++
 src/test/subscription/t/001_rep_changes.pl  | 76 -
 2 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 14d737fd93..1eec9f603d 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -141,6 +141,8 @@ static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid);
 static void rel_sync_cache_relation_cb(Datum arg, Oid relid);
 static void rel_sync_cache_publication_cb(Datum arg, int cacheid,
 		  uint32 hashvalue);
+static void rel_sync_cache_namespace_cb(Datum arg, int cacheid,
+		uint32 hashvalue);
 static void set_schema_sent_in_streamed_txn(RelationSyncEntry *entry,
 			TransactionId xid);
 static bool get_schema_sent_in_streamed_txn(RelationSyncEntry *entry,
@@ -1068,6 +1070,9 @@ init_rel_sync_cache(MemoryContext cachectx)
 	CacheRegisterSyscacheCallback(PUBLICATIONRELMAP,
   rel_sync_cache_publication_cb,
   (Datum) 0);
+	CacheRegisterSyscacheCallback(NAMESPACEOID,
+  rel_sync_cache_namespace_cb,
+  (Datum) 0);
 }
 
 /*
@@ -1342,6 +1347,52 @@ rel_sync_cache_relation_cb(Datum arg, Oid relid)
 	}
 }
 
+/*
+ * Namespace syscache invalidation callback
+ */
+static void
+rel_sync_cache_namespace_cb(Datum arg, int cacheid, uint32 hashvalue)
+{
+	HASH_SEQ_STATUS status;
+	RelationSyncEntry *entry;
+
+	/*
+	 * We can get here if the plugin was used in SQL interface as the
+	 * RelSchemaSyncCache is destroyed when the decoding finishes, but there
+	 * is no way to unregister the relcache invalidation callback.
+	 */
+	if (RelationSyncCache == NULL)
+		return;
+
+	hash_seq_init(, RelationSyncCache);
+	while ((entry = (RelationSyncEntry *) hash_seq_search()) != NULL)
+	{
+		/*
+		 * Reset schema sent status as the relation definition may have changed.
+		 * Also free any objects that depended on the earlier definition.
+		 */
+		entry->schema_sent = false;
+		list_free(entry->streamed_txns);
+		entry->streamed_txns = NIL;
+		if (entry->map)
+		{
+			/*
+			 * Must free the TupleDescs contained in the map explicitly,
+			 * because free_conversion_map() doesn't.
+			 */
+			FreeTupleDesc(entry->map->indesc);
+			FreeTupleDesc(entry->map->outdesc);
+			free_conversion_map(entry->map);
+		}
+		entry->map = NULL;
+
+		if (hash_search(RelationSyncCache,
+(void *) >relid,
+HASH_REMOVE, NULL) == NULL)
+			elog(ERROR, "hash table corrupted");
+	}
+}
+
 /*
  * Publication relation map syscache invalidation callback
  */
diff --git a/src/test/subscription/t/001_rep_changes.pl 

Re: badly calculated width of emoji in psql

2021-08-26 Thread Jacob Champion
On Wed, 2021-08-25 at 16:15 -0400, John Naylor wrote:
> On Tue, Aug 24, 2021 at 1:50 PM Jacob Champion  wrote:
> >
> > Does there need to be any sanity check for overlapping ranges between
> > the combining and fullwidth sets? The Unicode data on a dev's machine
> > would have to be broken somehow for that to happen, but it could
> > potentially go undetected for a while if it did.
> 
> It turns out I should have done that to begin with. In the Unicode
> data, it apparently happens that a character can be both combining
> and wide, and that will cause ranges to overlap in my scheme:

I was looking for overlaps in my review, but I skipped right over that,
sorry...

On Thu, 2021-08-26 at 11:12 -0400, John Naylor wrote:
> I pushed Jacob's patch with the addendum I shared upthread, plus a
> comment about search order. I also took the liberty of changing the
> author in the CF app to Jacob. Later I'll push detecting non-spacing
> characters beyond the BMP.

Thanks!

--Jacob


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> Anyone see a flaw in that analysis?

I am still waiting to hear the purpose of this preservation.  As long as
you don't apply the patch, I guess I will just stop asking.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: badly calculated width of emoji in psql

2021-08-26 Thread John Naylor
I wrote:
> On Tue, Aug 24, 2021 at 1:50 PM Jacob Champion 
wrote:

> It seems the logical thing to do is revert my 0001 and 0002 and go back
to something much closer to Jacob's patch, plus a big comment explaining
that the order in which the searches happen matters.

I pushed Jacob's patch with the addendum I shared upthread, plus a comment
about search order. I also took the liberty of changing the author in the
CF app to Jacob. Later I'll push detecting non-spacing characters beyond
the BMP.
--
John Naylor
EDB: http://www.enterprisedb.com


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Tue, Aug 17, 2021 at 2:50 PM Robert Haas  wrote:
> > Less sure that this is a good idea, though.  In particular, I do not
> > think that you can make it work in the face of
> > alter database template1 rename to oops;
> > create database template1;
>
> That is a really good point. If we can't categorically force the OID
> of those databases to have a particular, fixed value, and based on
> this example that seems to be impossible, then there's always a
> possibility that we might find a value in the old cluster that doesn't
> happen to match what is present in the new cluster. Seen from that
> angle, the problem is really with databases that are pre-existent in
> the new cluster but whose contents still need to be dumped. Maybe we
> could (optionally? conditionally?) drop those databases from the new
> cluster and then recreate them with the OID that we want them to have.

Actually, we do that already. create_new_objects() runs pg_restore
with --create for most databases, but with --clean --create for
template1 and postgres. This means that template1 and postgres will
always be recreated in the new cluster, and other databases are
assumed not to exist in the new cluster and the upgrade will fail if
they unexpectedly do. And the reason why pg_upgrade does that is that
it wants to "propagate [the] database-level properties" of postgres
and template1. So suppose we just make the database OID one of the
database-level properties that we want to propagate. That should
mostly just work, but where can things go wrong?

The only real failure mode is we try to create a database in the new
cluster and find out that the OID is already in use. If the new OID
that collides >64k, then the user has messed with the new cluster
before doing that. And since pg_upgrade is pretty clearly already
assuming that you shouldn't do that, it's fine to also make that
assumption in this case. We can disregard such cases as user error.

If the new OID that collides is <64k, then it must be colliding with
template0, template1, or postgres in the new cluster, because those
are the only databases that can have such OIDs since, currently, we
don't allow users to specify an OID for a new database. And the
problem cannot be with template1, because we hard-code its OID to 1.
If there is a database with OID 1 in either cluster, it must be
template1, and if there is a database with OID 1 in both clusters, it
must be template1 in both cases, and we'll just drop and recreate it
with OID 1 and everything is fine. So we need only consider template0
and postgres, which are created with system-generated OIDs. And, it
would be no issue if either of those databases had the same OID in the
old and new cluster, so the only possible OID collision is one where
the same system-generated OID was assigned to one of those databases
in the old cluster and to the other in the new cluster.

First consider the case where template0 has OID, say, 13000, in the
old cluster, and postgres has that OID in the new cluster. No problem
occurs, because template0 isn't transferred anyway. The reverse
direction is a problem, though. If postgres had been assigned OID
13000 in the old cluster and, by sheer chance, template0 had that OID
in the new cluster, then the upgrade would fail, because it wouldn't
be able to recreate the postgres database with the correct OID.

But that doesn't seem very difficult to fix. I think all we need to do
is have initdb assign a fixed OID to template0 at creation time. Then,
in any new release to which someone might be trying to upgrade, the
system-generated OID assigned to postgres in the old release can't
match the fixed OID assigned to template0 in the new release, so the
one problem case is ruled out. We do need, however, to make sure that
the assign-my-database-a-fixed-OID syntax is either entirely
restricted to initdb & pg_upgrade or at least that OIDS < 64k can only
be assigned in one of those modes. Otherwise, some creative person
could manufacture new problem cases by setting up the source database
so that the OID of one of their databases matches the fixed OID we
gave to template0 or template1, or the system-generated OID for
postgres in the new cluster.

In short, as far as I can see, all we need to do to preserve database
OIDs across pg_upgrade is:

1. Add a new syntax for creating a database with a given OID, and use
it in pg_dump --binary-upgrade.
2. Don't let users use it at least for OIDs <64k, or maybe just don't
let them use it at all.
3. But let initdb use it, and have initdb set the initial OID for
template0 to a fixed value < 1. If the user changes it later, no
problem; the cluster into which they are upgrading won't contain any
databases with high-numbered OIDs.

Anyone see a flaw in that analysis?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Printing backtrace of postgres processes

2021-08-26 Thread vignesh C
On Wed, Jul 21, 2021 at 10:56 PM vignesh C  wrote:
>
> On Wed, Jul 21, 2021 at 3:52 PM Bharath Rupireddy 
>  wrote:
> >
> > On Tue, Jul 13, 2021 at 9:03 PM vignesh C  wrote:
> > >
> > > On Wed, May 12, 2021 at 2:27 AM Robert Haas  wrote:
> > > >
> > > > Maybe we should have a role that is specifically for server debugging
> > > > type things. This kind of overlaps with Mark Dilger's proposal to try
> > > > to allow SET for security-sensitive GUCs to be delegated via
> > > > predefined roles. The exact way to divide that up is open to question,
> > > > but it wouldn't seem crazy to me if the same role controlled the
> > > > ability to do this plus the ability to set the GUCs
> > > > backtrace_functions, debug_invalidate_system_caches_always,
> > > > wal_consistency_checking, and maybe a few other things.
> > >
> > > +1 for the idea of having a new role for this. Currently I have
> > > implemented this feature to be supported only for the superuser. If we
> > > are ok with having a new role to handle debugging features, I will
> > > make a 002 patch to handle this.
> >
> > I see that there are a good number of user functions that are
> > accessible only by superuser (I searched for "if (!superuser())" in
> > the code base). I agree with the intention to not overload the
> > superuser anymore and have a few other special roles to delegate the
> > existing superuser-only functions to them. In that case, are we going
> > to revisit and reassign all the existing superuser-only functions?
>
> As Robert pointed out, this idea is based on Mark Dilger's proposal. Mark 
> Dilger is already handling many of them at [1]. I'm proposing this patch only 
> for server debugging functionalities based on Robert's suggestions at [2].
> [1] - https://commitfest.postgresql.org/33/3223/
> [2] - 
> https://www.postgresql.org/message-id/CA%2BTgmoZz%3DK1bQRp0Ug%3D6uMGFWg-6kaxdHe6VSWaxq0U-YkppYQ%40mail.gmail.com

The previous patch was failing because of the recent test changes made
by commit 201a76183e2 which unified new and get_new_node, attached
patch has the changes to handle the changes accordingly.

Regards,
Vignesh
From e99c97e9a8356976e556fc0cc061d6c98d6b58da Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Thu, 26 Aug 2021 17:41:02 +0530
Subject: [PATCH v8] Print backtrace of specified postgres process.

The idea here is to implement & expose pg_print_backtrace function, internally
what this function does is, the connected backend will send SIGUSR1 signal by
setting PROCSIG_PRINT_BACKTRACE to postgres backend whose pid matches the
specified process id. Once the backend process receives this signal it will
print the backtrace of the process to the log file based on the logging
configuration, if logging is disabled backtrace will be printed to the
console where postmaster was started.
---
 doc/src/sgml/func.sgml| 76 +++
 src/backend/postmaster/autovacuum.c   |  7 ++
 src/backend/postmaster/interrupt.c|  8 ++
 src/backend/storage/ipc/procsignal.c  | 18 +
 src/backend/storage/ipc/signalfuncs.c | 46 +++
 src/backend/tcop/postgres.c   |  9 +++
 src/backend/utils/error/elog.c| 20 -
 src/backend/utils/init/globals.c  |  1 +
 src/include/catalog/pg_proc.dat   |  5 ++
 src/include/miscadmin.h   |  2 +
 src/include/storage/procsignal.h  |  2 +
 src/include/utils/elog.h  |  2 +
 .../t/002_print_backtrace_validation.pl   | 73 ++
 13 files changed, 265 insertions(+), 4 deletions(-)
 create mode 100644 src/test/modules/test_misc/t/002_print_backtrace_validation.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 78812b2dbe..de7d341ee0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24832,6 +24832,32 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_print_backtrace
+
+pg_print_backtrace ( pid integer )
+boolean
+   
+   
+Send a request to the backend with the specified process ID to log its
+backtrace. The backtrace will be logged at message level
+LOG. It will appear in the server log based on the
+log configuration set (See  for
+more information), but will not be sent to the client regardless of
+. A backtrace will identify
+where exactly the backend process is currently executing. This may be
+useful to developers to diagnose stuck processes and other problems.
+This feature is not supported for the postmaster, logger, checkpointer,
+walwriter, background writer or statistics collector process. This
+feature will be available if PostgreSQL was built with the ability to
+capture backtrace. If not available, the function will return false
+and show a WARNING. Only superusers can request 

Re: prevent immature WAL streaming

2021-08-26 Thread Alvaro Herrera
So I'm again distracted by something else, so here's what will have to
pass for v3 for the time being.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 24165ab03e..43495b8260 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -731,8 +731,10 @@ typedef struct XLogCtlData
 	 */
 	XLogSegNo	lastNotifiedSeg;
 	XLogSegNo	earliestSegBoundary;
+	XLogRecPtr	earliestSegBoundaryStartPtr;
 	XLogRecPtr	earliestSegBoundaryEndPtr;
 	XLogSegNo	latestSegBoundary;
+	XLogRecPtr	latestSegBoundaryStartPtr;
 	XLogRecPtr	latestSegBoundaryEndPtr;
 
 	slock_t		segtrack_lck;	/* locks shared variables shown above */
@@ -932,7 +934,7 @@ static void RemoveXlogFile(const char *segname, XLogSegNo recycleSegNo,
 		   XLogSegNo *endlogSegNo);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
-static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr pos);
+static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr startpos, XLogRecPtr endpos);
 static void CleanupBackupHistory(void);
 static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
@@ -1022,6 +1024,8 @@ XLogInsertRecord(XLogRecData *rdata,
 			   info == XLOG_SWITCH);
 	XLogRecPtr	StartPos;
 	XLogRecPtr	EndPos;
+	XLogSegNo	StartSeg;
+	XLogSegNo	EndSeg;
 	bool		prevDoPageWrites = doPageWrites;
 
 	/* we assume that all of the record header is in the first chunk */
@@ -1157,6 +1161,31 @@ XLogInsertRecord(XLogRecData *rdata,
 		 */
 	}
 
+	/*
+	 * Before releasing our WAL insertion lock, register that we crossed the
+	 * segment boundary if that occurred.  We need to do it with the lock held
+	 * for GetSafeFlushRecPtr's sake: otherwise it could see the WAL flush
+	 * point advance but not see the registration, which would lead it to
+	 * wrongly conclude that our flush point is safe to use.
+	 */
+	if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
+	{
+		XLByteToSeg(StartPos, StartSeg, wal_segment_size);
+		XLByteToSeg(EndPos, EndSeg, wal_segment_size);
+
+		/*
+		 * Register our crossing the segment boundary if that occurred.
+		 *
+		 * Note that we did not use XLByteToPrevSeg() for determining the
+		 * ending segment.  This is so that a record that fits perfectly into
+		 * the end of the segment causes said segment to get marked ready for
+		 * archival immediately.
+		 */
+		if (StartSeg != EndSeg &&
+			(XLogArchivingActive() || XLogStandbyInfoActive()))
+			RegisterSegmentBoundary(EndSeg, StartPos, EndPos);
+	}
+
 	/*
 	 * Done! Let others know that we're finished.
 	 */
@@ -1168,27 +1197,10 @@ XLogInsertRecord(XLogRecData *rdata,
 
 	/*
 	 * If we crossed page boundary, update LogwrtRqst.Write; if we crossed
-	 * segment boundary, register that and wake up walwriter.
+	 * segment boundary, wake up walwriter.
 	 */
 	if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
 	{
-		XLogSegNo	StartSeg;
-		XLogSegNo	EndSeg;
-
-		XLByteToSeg(StartPos, StartSeg, wal_segment_size);
-		XLByteToSeg(EndPos, EndSeg, wal_segment_size);
-
-		/*
-		 * Register our crossing the segment boundary if that occurred.
-		 *
-		 * Note that we did not use XLByteToPrevSeg() for determining the
-		 * ending segment.  This is so that a record that fits perfectly into
-		 * the end of the segment causes the latter to get marked ready for
-		 * archival immediately.
-		 */
-		if (StartSeg != EndSeg && XLogArchivingActive())
-			RegisterSegmentBoundary(EndSeg, EndPos);
-
 		/*
 		 * Advance LogwrtRqst.Write so that it includes new block(s).
 		 *
@@ -2649,7 +2661,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 
 LogwrtResult.Flush = LogwrtResult.Write;	/* end of page */
 
-if (XLogArchivingActive())
+if (XLogArchivingActive() || XLogStandbyInfoActive())
 	NotifySegmentsReadyForArchive(LogwrtResult.Flush);
 
 XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
@@ -2739,7 +2751,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 		SpinLockRelease(>info_lck);
 	}
 
-	if (XLogArchivingActive())
+	if (XLogArchivingActive() || XLogStandbyInfoActive())
 		NotifySegmentsReadyForArchive(LogwrtResult.Flush);
 }
 
@@ -4398,7 +4410,7 @@ ValidateXLOGDirectoryStructure(void)
  * to delay until the end segment is known flushed.
  */
 static void
-RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr endpos)
+RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr startpos, XLogRecPtr endpos)
 {
 	XLogSegNo	segno PG_USED_FOR_ASSERTS_ONLY;
 
@@ -4415,6 +4427,7 @@ RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr endpos)
 	if (XLogCtl->earliestSegBoundary == MaxXLogSegNo)
 	{
 		XLogCtl->earliestSegBoundary = seg;
+		XLogCtl->earliestSegBoundaryStartPtr = startpos;
 		XLogCtl->earliestSegBoundaryEndPtr = endpos;
 	}
 	else if (seg > XLogCtl->earliestSegBoundary &&
@@ -4422,6 +4435,7 @@ 

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 03:59:49PM +0200, Daniel Gustafsson wrote:
> > On 26 Aug 2021, at 15:43, Julien Rouhaud  wrote:
> > 
> > Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson  > > a écrit :
> > > On 26 Aug 2021, at 15:09, Bruce Momjian  > > > wrote:
> > 
> > > Basically, pg_upgrade has avoided any backend changes that could be
> > > controlled by GUCs and I am not sure we want to start adding such
> > > changes for just this.
> > 
> > In principle I think it’s sound to try to avoid backend changes where 
> > possible
> > without sacrificing robustness.
> > 
> > I agree, but it seems quite more likely that an extension relying on a 
> > bgworker changes this guc, compared to an extension forcing autovacuum to 
> > be on for instance. 
> 
> Agreed, in this particular case I think there is merit to the idea of 
> enforcing
> it in the backend.

OK, works for me

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Identify missing publications from publisher while create/alter subscription.

2021-08-26 Thread vignesh C
On Thu, Jul 15, 2021 at 5:57 PM vignesh C  wrote:
>
> On Tue, Jul 6, 2021 at 8:09 PM vignesh C  wrote:
> >
> > On Wed, Jun 30, 2021 at 8:23 PM vignesh C  wrote:
> > >
> > > On Sun, Jun 6, 2021 at 11:55 AM vignesh C  wrote:
> > > >
> > > > On Fri, May 7, 2021 at 6:44 PM vignesh C  wrote:
> > > > >
> > > > > Thanks for the comments, the attached patch has the fix for the same.
> > > >

The patch was not applying on the head, attached patch which is rebased on HEAD.

Regards,
Vignesh
From 36e522ba31829c76e6c4c2069f2806c653aecf62 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Thu, 26 Aug 2021 16:32:56 +0530
Subject: [PATCH v11] Identify missing publications from publisher while
 create/alter subscription.

Creating/altering subscription is successful when we specify a publication which
does not exist in the publisher. This patch checks if the specified publications
are present in the publisher and throws an error if any of the publication is
missing in the publisher.
---
 doc/src/sgml/ref/alter_subscription.sgml  |  13 ++
 doc/src/sgml/ref/create_subscription.sgml |  18 +-
 src/backend/commands/subscriptioncmds.c   | 207 +++---
 src/bin/psql/tab-complete.c   |  14 +-
 src/test/subscription/t/007_ddl.pl|  68 ++-
 5 files changed, 287 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 835be0d2a4..d5b28e9afa 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -163,6 +163,19 @@ ALTER SUBSCRIPTION name RENAME TO <
  
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 702934eba1..a39c85b2bb 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -207,8 +207,9 @@ CREATE SUBSCRIPTION subscription_nameCREATE SUBSCRIPTION
   should connect to the publisher at all.  Setting this to
   false will change default values of
-  enabled, create_slot and
-  copy_data to false.
+  enabled, create_slot,
+  copy_data and
+  validate_publication to false.
  
 
  
@@ -266,6 +267,19 @@ CREATE SUBSCRIPTION subscription_name

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index c47ba26369..8d2aaf72c6 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -60,6 +60,7 @@
 #define SUBOPT_BINARY0x0080
 #define SUBOPT_STREAMING			0x0100
 #define SUBOPT_TWOPHASE_COMMIT		0x0200
+#define SUBOPT_VALIDATE_PUB			0x0400
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -81,6 +82,7 @@ typedef struct SubOpts
 	bool		binary;
 	bool		streaming;
 	bool		twophase;
+	bool		validate_publication;
 } SubOpts;
 
 static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
@@ -128,6 +130,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->streaming = false;
 	if (IsSet(supported_opts, SUBOPT_TWOPHASE_COMMIT))
 		opts->twophase = false;
+	if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB))
+		opts->validate_publication = false;
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -247,6 +251,15 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_TWOPHASE_COMMIT;
 			opts->twophase = defGetBoolean(defel);
 		}
+		else if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB) &&
+ strcmp(defel->defname, "validate_publication") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_VALIDATE_PUB))
+errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_VALIDATE_PUB;
+			opts->validate_publication = defGetBoolean(defel);
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
@@ -285,10 +298,19 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 	 errmsg("%s and %s are mutually exclusive options",
 			"connect = false", "copy_data = true")));
 
+		if (opts->validate_publication &&
+			IsSet(supported_opts, SUBOPT_VALIDATE_PUB) &&

Re: Update of partition key on foreign server

2021-08-26 Thread Ilya Gladyshev
2 авг. 2021 г., в 15:29, Илья Гладышев  написал(а):
  


  
  

Hi,I am currently looking into a partition constraint violation
that occurs on update of a partition key on a foreign server. To reproduce it you can run:On server 1 using port 5432:
  
  create extension postgres_fdw;
  create table players (id integer, name text) partition by
  list(name);
  create table players_0 partition of players for values in
  ('name1');
  create server neighbor foreign data wrapper postgres_fdw options
  (host 'localhost', port '5433', dbname 'postgres');
  create foreign table players_1 partition of players for values in
  ('name2') server neighbor ;
  create user mapping for current_user server neighbor;On server 2 using port 5433:
  create extension postgres_fdw;
  create server neighbor foreign data wrapper postgres_fdw options
  (host 'localhost', port '5432', dbname 'postgres');
  create table players (id integer, name text) partition by
  list(name);
  create table players_1 partition of players for values in
  ('name2');
  create user mapping for current_user server neighbor;
  create foreign table players_0 partition of players for values in
  ('name1') server neighbor;
  insert into players values (1, 'name1');
  update players set name='name2' where name='name1';
  ERROR:  new row for relation "players_0" violates partition
constraint
DETAIL:  Failing row contains (1, name2).
CONTEXT:  remote SQL command: UPDATE public.players_0 SET name =
'name2'::text WHERE ((name = 'name1'::text))From what I have read on the mailing list, I understand that
this is a known problem, but I haven't found any thread
discussing it in particular. Is this something that needs
fixing? If it is, I want to try to do it, but I’m wondering if
there are any known caveats and looking for any tips on how to
implement it. 
  My understanding is that this should be implemented in a
similar way to how the row is routed from a local partition in
ExecCrossPartitionUpdate, so the update should be replaced with
a foreign delete + local/foreign insert. In addition, a direct
update should be forbidden when the query modifies the partition
key. I’m probably missing a lot of details (feel free to point out), but is the general idea correct? I will be grateful for any feedback.Thanks,Ilya Gladyshev

  

I have developed a simple patch to fix this, while I’m not fully satisfied with it, it gets the job done. From what I have read in the docs, partition constraints are currently not checked for foreign tables, because they must be enforced on the remote server anyway (because there might be many other ways to access the data source besides FDW), and thus there’s no point in doing that locally. However, in the case of foreign partition update, checking the constraints locally can be used to allow for routing from remote sources, so I don’t feel like this point is valid in this case. In message [1] there’s also mentioned possibility of existence of triggers on the foreign server, and transforming the update to delete+insert will cause these triggers to be omitted. While it is true, I feel like partition pruning already has a similar effect, as it allows to skip scanning foreign partitions. The only way to both fix the update and have the triggers work, that I came up with, is to use parent partitioned table as a target for the foreign update (FDW request would then be "UPDATE public.players …"), while this is possible, it requires the foreign server to have the same partitioned table, which seems quite restrictive to me. Please let me know if I’m missing something or there is a better way to do it.Thanks,Ilya Gladyshev[1] https://www.postgresql.org/message-id/5A93F487.4080602%40lab.ntt.co.jp

0001-Partition-key-update-in-foreign-table.patch
Description: Binary data


Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Denis Laxalde

Bruce Momjian a écrit :

On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote:

On 26 Aug 2021, at 15:09, Bruce Momjian  wrote:
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:

.. I still think that
changing bgworker_should_start_now() is a better option.

I am not sure.  We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:

True, but there are also conditionals on IsBinaryUpgrade for starting the
autovacuum launcher in the postmaster, so there is some precedent.

Oh, I was not aware of that.



If I understand correctly, autovacuum is turned off by pg_upgrade code 
only if the old cluster does not support the -b flag (prior to 9.1 
apparently). Otherwise, this is indeed handled by IsBinaryUpgrade.





Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Daniel Gustafsson
> On 26 Aug 2021, at 15:43, Julien Rouhaud  wrote:
> 
> Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson  > a écrit :
> > On 26 Aug 2021, at 15:09, Bruce Momjian  > > wrote:
> 
> > Basically, pg_upgrade has avoided any backend changes that could be
> > controlled by GUCs and I am not sure we want to start adding such
> > changes for just this.
> 
> In principle I think it’s sound to try to avoid backend changes where possible
> without sacrificing robustness.
> 
> I agree, but it seems quite more likely that an extension relying on a 
> bgworker changes this guc, compared to an extension forcing autovacuum to be 
> on for instance. 

Agreed, in this particular case I think there is merit to the idea of enforcing
it in the backend.

--
Daniel Gustafsson   https://vmware.com/





Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Julien Rouhaud
Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson  a écrit :

> > On 26 Aug 2021, at 15:09, Bruce Momjian  wrote:
>
> > Basically, pg_upgrade has avoided any backend changes that could be
> > controlled by GUCs and I am not sure we want to start adding such
> > changes for just this.
>
> In principle I think it’s sound to try to avoid backend changes where
> possible
> without sacrificing robustness.
>

I agree, but it seems quite more likely that an extension relying on a
bgworker changes this guc, compared to an extension forcing autovacuum to
be on for instance.

>


Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote:
> > On 26 Aug 2021, at 15:09, Bruce Momjian  wrote:
> > On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:
> 
> >> .. I still think that
> >> changing bgworker_should_start_now() is a better option.
> > 
> > I am not sure.  We have a lot of pg_upgrade code that turns off things
> > like autovacuum and that has worked fine:
> 
> True, but there are also conditionals on IsBinaryUpgrade for starting the
> autovacuum launcher in the postmaster, so there is some precedent.

Oh, I was not aware of that.

> > Basically, pg_upgrade has avoided any backend changes that could be
> > controlled by GUCs and I am not sure we want to start adding such
> > changes for just this.
> 
> In principle I think it’s sound to try to avoid backend changes where possible
> without sacrificing robustness.

Agreed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Daniel Gustafsson
> On 26 Aug 2021, at 15:09, Bruce Momjian  wrote:
> On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:

>> .. I still think that
>> changing bgworker_should_start_now() is a better option.
> 
> I am not sure.  We have a lot of pg_upgrade code that turns off things
> like autovacuum and that has worked fine:

True, but there are also conditionals on IsBinaryUpgrade for starting the
autovacuum launcher in the postmaster, so there is some precedent.

> Basically, pg_upgrade has avoided any backend changes that could be
> controlled by GUCs and I am not sure we want to start adding such
> changes for just this.

In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.

--
Daniel Gustafsson   https://vmware.com/





Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:
> On Thu, Aug 26, 2021 at 3:15 PM Denis Laxalde  
> wrote:
> >
> > Michael Paquier a écrit :
> > >> @@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
> > >>   static bool
> > >>   bgworker_should_start_now(BgWorkerStartTime start_time)
> > >>   {
> > >> +if (IsBinaryUpgrade)
> > >> +return false;
> > >> +
> > > Using -c max_worker_processes=0 would just have the same effect, no?
> > > So we could just patch pg_upgrade's server.c to get the same level of
> > > protection?
> >
> > Yes, same effect indeed. This would log "too many background workers"
> > messages in pg_upgrade logs, though.
> > See attached patch implementing this suggestion.
> 
> I disagree.  It can appear to have the same effect but it's not
> guaranteed.  Any module in shared_preload_libraries could stick a
> "max_worker_processes +=X" if it thinks it should account for its own
> ressources.  That may not be something encouraged, but it's definitely
> possible (and I think Andres recently mentioned that some extensions
> do things like that, although maybe for other GUCs) and could result
> in a corruption of a pg_upgrade'd cluster, so I still think that
> changing bgworker_should_start_now() is a better option.

I am not sure.  We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:

snprintf(cmd, sizeof(cmd),
 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start",
 cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
 (cluster->controldata.cat_ver >=
  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
 " -c autovacuum=off -c autovacuum_freeze_max_age=20",
 (cluster == _cluster) ?
 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off 
-c vacuum_defer_cleanup_age=0" : "",
 cluster->pgopts ? cluster->pgopts : "", socket_string);

Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





AIX: Symbols are missing in libpq.a

2021-08-26 Thread REIX, Tony
Hi,

While porting postgresql-odbc v13 to AIX, we have found that (at least) 2 
symbols are missing in libpq.a provided by the port of PostgreSQL v13.1 to AIX 
7.1 by the BullFreeware project :

pg_char_to_encoding
pg_encoding_to_char

Looking at details, it appears that these symbols are present in version 12.8 .
They were still missing in 13.4 .
Something has changed between v12 and v13.

Looking at more details, the way libpq.a is built on AIX is different from the 
way libpq.so is built on Linux.
On Linux, the file "exports.txt" is used for building the list of symbols to be 
exported.
On AIX, the tool  mkldexport.sh is used for dynamically generating the symbols 
to be exported.
And it appears that 5 symbols (including the 2 above) are missing on AIX. Don't 
know why.

A solution is to merge the two list of symbols to be exported in one list.
This is done by the patch attached here.
This patch does:
  - add a new 11-lines script  ./src/backend/port/aix/mergeldexport.sh which 
makes the merge only if the file exports.txt does exist.
  - add the use of this script in:  ./src/Makefile.shlib  to be used for AIX 
only
  - add the definition of variable MERGELDEXPORT in: 
./src/makefiles/Makefile.aix
Very simple.

I suggest to apply the change for v14 .

Regards/Cordialement,

Tony Reix

tony.r...@atos.net

ATOS / Bull SAS
ATOS Expert
IBM-Bull Cooperation Project: Architect & OpenSource Technical Leader
1, rue de Provence - 38432 ECHIROLLES - FRANCE
www.atos.net


postgresql-14.1-MergeExports.patch
Description: postgresql-14.1-MergeExports.patch


Re: Skipping logical replication transactions on subscriber side

2021-08-26 Thread Masahiko Sawada
On Thu, Aug 26, 2021 at 9:11 PM Amit Kapila  wrote:
>
> On Thu, Aug 26, 2021 at 4:42 PM Masahiko Sawada  wrote:
> >
> > On Thu, Aug 26, 2021 at 3:09 PM Amit Kapila  wrote:
> > > 1.
> > > + if (errarg->rel)
> > > + appendStringInfo(, _(" for replication target relation \"%s.%s\""),
> > > + errarg->rel->remoterel.nspname,
> > > + errarg->rel->remoterel.relname);
> > > +
> > > + if (errarg->remote_attnum >= 0)
> > > + appendStringInfo(, _(" column \"%s\""),
> > > + errarg->rel->remoterel.attnames[errarg->remote_attnum]);
> > >
> > > Isn't it better if 'remote_attnum' check is inside if (errargrel)
> > > check? It will be weird to print column information without rel
> > > information and in the current code, we don't set remote_attnum
> > > without rel. The other possibility could be to have an Assert for rel
> > > in 'remote_attnum' check.
> >
> > Agreed to check 'remote_attnum' inside "if(errargrel)".
> >
>
> Okay, changed accordingly. Additionally, I have changed the code which
> sets timestamp to (unset) when it is 0 so that it won't display the
> timestamp in that case. I have made few other cosmetic changes in the
> attached patch. See and let me know what you think of it?

Thank you for the patch!

Agreed with these changes. The patch looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: pg_receivewal starting position

2021-08-26 Thread Ronan Dunklau
Le jeudi 29 juillet 2021, 11:09:40 CEST Ronan Dunklau a écrit :
> Patch 0001 adds the new READ_REPLICATION_SLOT command.
> It returns for a given slot the type, restart_lsn, flush_lsn,
> restart_lsn_timeline and flush_lsn_timeline.
> The timelines are determined by reading the current timeline history, and
> finding the timeline where we may find the record. I didn't find explicit
> test for eg IDENTIFY_SYSTEM so didn't write one either for this new
> command, but it is tested indirectly in patch 0002.
> 
> Patch 0002 makes pg_receivewal use that command if we use a replication slot
> and the command is available, and use the restart_lsn and
> restart_lsn_timeline as a starting point. It also adds a small test to
> check that we start back from the previous restart_lsn instead of the
> current flush position when our destination directory does not contain any
> WAL file.
> 
> I also noticed we don't test following a timeline switch. It would probably
> be good to add that, both for the case where we determine the previous
> timeline from the archived segments and when it comes from the new command.
> What do you think ?

Following the discussion at [1], I refactored the implementation into 
streamutil and added a third patch making use of it in pg_basebackup itself in 
order to fail early if the replication slot doesn't exist, so please find 
attached v2 for that.

Best regards,

[1]: https://www.postgresql.org/message-id/flat/
CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com

-- 
Ronan Dunklau>From fff8786049326864d3ef8fe4539e1829f933f32f Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Wed, 28 Jul 2021 16:34:54 +0200
Subject: [PATCH v2 1/3] Add READ_REPLICATION_SLOT command.

This commit introduces a new READ_REPLICATION_SLOT  command.
This command is used to read information about a replication slot when
using a physical replication connection.

In this first version it returns the slot type, restart_lsn, flush_lsn and
the timeline of the restart_lsn and flush_lsn, which are obtained by following the
current timeline history.
---
 doc/src/sgml/protocol.sgml |  62 +++
 src/backend/replication/repl_gram.y|  18 -
 src/backend/replication/repl_scanner.l |   1 +
 src/backend/replication/walsender.c| 106 +
 src/include/nodes/nodes.h  |   1 +
 src/include/nodes/replnodes.h  |  10 +++
 6 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..6207171426 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,68 @@ The commands accepted in replication mode are:
 
   
 
+  
+READ_REPLICATION_SLOT slot_name
+  READ_REPLICATION_SLOT
+
+
+  
+  Read information about the named replication slot.  This is useful to determine which WAL location we should be asking the server to start streaming at.
+  
+  
+  In response to this command, the server will return a one-row result set, containing the following fields:
+
+  
+type (text)
+
+  
+   The replication slot's type, either physical or logical
+  
+
+  
+
+  
+restart_lsn (text)
+
+  
+   The replication slot's restart_lsn.
+  
+
+  
+
+  
+confirmed_flush_lsn (text)
+
+  
+   The replication slot's confirmed_flush LSN.
+  
+
+  
+
+  
+restart_lsn_timeline (int4)
+
+  
+   The timeline ID for the restart_lsn position, when following the current timeline
+   history
+  
+
+  
+
+  
+confirmed_flush_lsn_timeline (int4)
+
+  
+   The timeline ID for the confirmed_flush_lsn position, when following the current timeline
+   history
+  
+
+  
+
+  
+
+  
+
   
 START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE tli ]
  START_REPLICATION
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..7298f44008 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type 	command
 %type 	base_backup start_replication start_logical_replication
 create_replication_slot drop_replication_slot identify_system
-timeline_history show sql_cmd
+

Re: Skipping logical replication transactions on subscriber side

2021-08-26 Thread Amit Kapila
On Thu, Aug 26, 2021 at 4:42 PM Masahiko Sawada  wrote:
>
> On Thu, Aug 26, 2021 at 3:09 PM Amit Kapila  wrote:
> > 1.
> > + if (errarg->rel)
> > + appendStringInfo(, _(" for replication target relation \"%s.%s\""),
> > + errarg->rel->remoterel.nspname,
> > + errarg->rel->remoterel.relname);
> > +
> > + if (errarg->remote_attnum >= 0)
> > + appendStringInfo(, _(" column \"%s\""),
> > + errarg->rel->remoterel.attnames[errarg->remote_attnum]);
> >
> > Isn't it better if 'remote_attnum' check is inside if (errargrel)
> > check? It will be weird to print column information without rel
> > information and in the current code, we don't set remote_attnum
> > without rel. The other possibility could be to have an Assert for rel
> > in 'remote_attnum' check.
>
> Agreed to check 'remote_attnum' inside "if(errargrel)".
>

Okay, changed accordingly. Additionally, I have changed the code which
sets timestamp to (unset) when it is 0 so that it won't display the
timestamp in that case. I have made few other cosmetic changes in the
attached patch. See and let me know what you think of it?

Note - I have just attached the first patch here, once this is
committed we can focus on others.

-- 
With Regards,
Amit Kapila.


v12-0001-Add-logical-change-details-to-logical-replicatio.patch
Description: Binary data


Re: list of acknowledgments for PG14

2021-08-26 Thread Peter Eisentraut

On 26.08.21 10:48, Daniel Gustafsson wrote:

On 26 Aug 2021, at 10:41, Peter Eisentraut  
wrote:



Attached is the plain-text list of acknowledgments for the PG14 release notes, 
current through today.  Please check for problems such as wrong sorting, 
duplicate names in different variants, or names in the wrong order etc.  (Note 
that the current standard is given name followed by surname, independent of 
cultural origin.)


I would have expected “Ö” (Önder Kalacı) to sort after “Z” but that might only
be true for my locale?


The sort order is COLLATE "en-x-icu".




Re: Minimal logical decoding on standbys

2021-08-26 Thread Peter Eisentraut
I noticed the tests added in this patch set are very slow.  Here are 
some of the timings:


...
[13:26:59] t/018_wal_optimize.pl  ok13976 ms
[13:27:13] t/019_replslot_limit.pl .. ok10976 ms
[13:27:24] t/020_archive_status.pl .. ok 6190 ms
[13:27:30] t/021_row_visibility.pl .. ok 3227 ms
[13:27:33] t/022_crash_temp_files.pl  ok 2296 ms
[13:27:36] t/023_pitr_prepared_xact.pl .. ok 3601 ms
[13:27:39] t/024_archive_recovery.pl  ok 3937 ms
[13:27:43] t/025_stuck_on_old_timeline.pl ... ok 4348 ms
[13:27:47] t/026_standby_logical_decoding.pl  ok   117730 ms  <<<

Is it possible to improve this?




Re: row filtering for logical replication

2021-08-26 Thread Amit Kapila
On Thu, Aug 26, 2021 at 3:41 PM Peter Smith  wrote:
>
> On Thu, Aug 26, 2021 at 3:00 PM Amit Kapila  wrote:
> >
> > On Thu, Aug 26, 2021 at 9:51 AM Peter Smith  wrote:
> > >
> > > On Thu, Aug 26, 2021 at 1:20 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Thu, Aug 26, 2021 at 7:37 AM Peter Smith  
> > > > wrote:
> > > > >
> > > > > On Wed, Aug 25, 2021 at 3:28 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > ...
> > > > > >
> > > > > > Hmm, I think the gain via caching is not visible because we are 
> > > > > > using
> > > > > > simple expressions. It will be visible when we use somewhat complex
> > > > > > expressions where expression evaluation cost is significant.
> > > > > > Similarly, the impact of this change will magnify and it will also 
> > > > > > be
> > > > > > visible when a publication has many tables. Apart from performance,
> > > > > > this change is logically correct as well because it would be any way
> > > > > > better if we don't invalidate the cached expressions unless 
> > > > > > required.
> > > > >
> > > > > Please tell me what is your idea of a "complex" row filter expression.
> > > > > Do you just mean a filter that has multiple AND conditions in it? I
> > > > > don't really know if few complex expressions would amount to any
> > > > > significant evaluation costs, so I would like to run some timing tests
> > > > > with some real examples to see the results.
> > > > >
> > > >
> > > > I think this means you didn't even understand or are convinced why the
> > > > patch has cache in the first place. As per your theory, even if we
> > > > didn't have cache, it won't matter but that is not true otherwise, the
> > > > patch wouldn't have it.
> > >
> > > I have never said there should be no caching. On the contrary, my
> > > performance test results [1] already confirmed that caching ExprState
> > > is of benefit for the millions of times it may be used in the
> > > pgoutput_row_filter function. My only doubts are in regard to how much
> > > observable impact there would be re-evaluating the filter expression
> > > just a few extra times by the get_rel_sync_entry function.
> > >
> >
> > I think it depends but why in the first place do you want to allow
> > re-evaluation when there is a way for not doing that?
>
> Because the current code logic of having the "delayed" ExprState
> evaluation does come at some cost.
>

So, now you mixed it with the second point. Here, I was talking about
the need for correct invalidation but you started discussing when to
first time evaluate the expression, both are different things.

>  And the cost is -
> a. Needing an extra condition and more code in the function 
> pgoutput_row_filter
> b. Needing to maintain the additional Node list
>

I am not sure you need (b) above and I think (a) should make the
overall code look clean.

> If we chose not to implement a delayed ExprState cache evaluation then
> there would still be a (one-time) ExprState cache evaluation but it
> would happen whenever get_rel_sync_entry is called (regardless of if
> pgoputput_row_filter is subsequently called). E.g. there can be some
> rebuilds of the ExprState cache if the user calls TRUNCATE.
>

Apart from Truncate, it will also be a waste if any error happens
before actually evaluating the filter, tomorrow there could be other
operations like replication of sequences (I have checked that proposed
patch for sequences uses get_rel_sync_entry) where we don't need to
build ExprState (as filters might or might not be there). So, it would
be better to avoid cache lookups in those cases if possible. I still
think doing expensive things like preparing expressions should ideally
be done only when it is required.
--
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2021-08-26 Thread Masahiko Sawada
On Thu, Aug 26, 2021 at 3:09 PM Amit Kapila  wrote:
>
> On Thu, Aug 26, 2021 at 9:50 AM Masahiko Sawada  wrote:
> >
> > On Thu, Aug 26, 2021 at 12:51 PM Amit Kapila  
> > wrote:
> >
> > Yeah, I agree that it's a handy way to detect missing a switch case
> > but I think that we don't necessarily need it in this case. Because
> > there are many places in the code where doing similar things and when
> > it comes to apply_dispatch() it's the entry function to handle the
> > incoming message so it will be unlikely that we miss adding a switch
> > case until the patch gets committed. If we don't move it, we would end
> > up either adding the code resetting the
> > apply_error_callback_arg.command to every message type, adding a flag
> > indicating the message is handled and checking later, or having a big
> > if statement checking if the incoming message type is valid etc.
> >
>
> I was reviewing and making minor edits to your v11-0001* patch and
> noticed that the below parts of the code could be improved:

Thank you for the comments!

> 1.
> + if (errarg->rel)
> + appendStringInfo(, _(" for replication target relation \"%s.%s\""),
> + errarg->rel->remoterel.nspname,
> + errarg->rel->remoterel.relname);
> +
> + if (errarg->remote_attnum >= 0)
> + appendStringInfo(, _(" column \"%s\""),
> + errarg->rel->remoterel.attnames[errarg->remote_attnum]);
>
> Isn't it better if 'remote_attnum' check is inside if (errargrel)
> check? It will be weird to print column information without rel
> information and in the current code, we don't set remote_attnum
> without rel. The other possibility could be to have an Assert for rel
> in 'remote_attnum' check.

Agreed to check 'remote_attnum' inside "if(errargrel)".

>
> 2.
> + /* Reset relation for error callback */
> + apply_error_callback_arg.rel = NULL;
> +
>   logicalrep_rel_close(rel, NoLock);
>
>   end_replication_step();
>
> Isn't it better to reset relation info as the last thing in
> apply_handle_insert/update/delete as you do for a few other
> parameters? There is very little chance of error from those two
> functions but still, it will be good if they ever throw an error and
> it might be clear for future edits in this function that this needs to
> be set as the last thing in these functions.

On Thu, Aug 26, 2021 at 3:30 PM Amit Kapila  wrote:
>
> I see that resetting it before logicalrep_rel_close has an advantage
> that we might not accidentally access some information after close
> which is not there in rel. I am not sure if that is the reason you
> have in mind for resetting it before close.

Yes, that's why I reset the apply_error_callback_arg.rel before
logicalrep_rel_close(), not at the end of the function.

Since the callback function refers to apply_error_callback_arg.rel it
still needs to be valid when an error occurs. Moving it to the end of
the function is no problem for now, but if we always reset relation
info as the last thing, I think that we cannot allow adding changes
between setting relation info and the end of the function (i.g.,
resetting relation info) that could lead to invalidate fields of
apply_error_callback_arg.rel (e.g, freeing a string value etc).

> Note - I can take care of the above points based on whatever we agree
> with, you don't need to send a new version for this.

Thanks!

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Failure of subscription tests with topminnow

2021-08-26 Thread Ajin Cherian
On Thu, Aug 26, 2021 at 2:45 PM Masahiko Sawada  wrote:

> I think that it’s possible that the orders of the process writing
> disconnections logs and setting 0 to walsender's pid are mismatched.
> We set 0 to walsender's pid in WalSndKill() that is called during
> on_shmem_exit callback. Once we set 0, pg_stat_replication doesn't
> show the entry. On the other hand, disconnections logs are written by
> log_disconnections() that is called during on_proc_exit callback. That
> is, the following sequence could happen:
>
> 1. the second walsender (pid = 16475) raises an error as the slot is
> already active (held by the first walsender).
> 2. the first walsender (pid = 16336) clears its pid on the shmem.
> 3. the polling query checks the walsender’s pid, and returns true
> (since there is only the second walsender now).
> 4. the second walsender clears its pid on the shmem.
> 5. the second walsender writes disconnection log.
> 6. the first walsender writes disconneciton log.

I agree with this.

Attaching a patch on head that modifies this particular script to also
consider the state of the walsender.

regards,
Ajin Cherian
Fujitsu Australia


v1-0001-fix-for-tap-test-failure-seen-in-001_rep_changes.patch
Description: Binary data


Re: Error code for checksum failure in origin.c

2021-08-26 Thread Daniel Gustafsson
> On 26 Aug 2021, at 12:03, Amit Kapila  wrote:
> 
> On Thu, Aug 26, 2021 at 3:18 PM Kasahara Tatsuhito
>  wrote:
>> 
>> Hi.
>> 
>> In the code in src/backend/replication/logical/origin.c,
>> the error code "ERRCODE_CONFIGURATION_LIMIT_EXCEEDED" is given
>> when a checksum check results in an error,
>> but "ERRCODE_ DATA_CORRUPTED" seems to be more appropriate.
>> 
> 
> +1. Your observation looks correct to me and the error code suggested
> by you seems appropriate. We use ERRCODE_DATA_CORRUPTED in
> ReadTwoPhaseFile() for similar error.

Agreed, +1 for changing this.

--
Daniel Gustafsson   https://vmware.com/





Re: row filtering for logical replication

2021-08-26 Thread Peter Smith
On Thu, Aug 26, 2021 at 3:00 PM Amit Kapila  wrote:
>
> On Thu, Aug 26, 2021 at 9:51 AM Peter Smith  wrote:
> >
> > On Thu, Aug 26, 2021 at 1:20 PM Amit Kapila  wrote:
> > >
> > > On Thu, Aug 26, 2021 at 7:37 AM Peter Smith  wrote:
> > > >
> > > > On Wed, Aug 25, 2021 at 3:28 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > ...
> > > > >
> > > > > Hmm, I think the gain via caching is not visible because we are using
> > > > > simple expressions. It will be visible when we use somewhat complex
> > > > > expressions where expression evaluation cost is significant.
> > > > > Similarly, the impact of this change will magnify and it will also be
> > > > > visible when a publication has many tables. Apart from performance,
> > > > > this change is logically correct as well because it would be any way
> > > > > better if we don't invalidate the cached expressions unless required.
> > > >
> > > > Please tell me what is your idea of a "complex" row filter expression.
> > > > Do you just mean a filter that has multiple AND conditions in it? I
> > > > don't really know if few complex expressions would amount to any
> > > > significant evaluation costs, so I would like to run some timing tests
> > > > with some real examples to see the results.
> > > >
> > >
> > > I think this means you didn't even understand or are convinced why the
> > > patch has cache in the first place. As per your theory, even if we
> > > didn't have cache, it won't matter but that is not true otherwise, the
> > > patch wouldn't have it.
> >
> > I have never said there should be no caching. On the contrary, my
> > performance test results [1] already confirmed that caching ExprState
> > is of benefit for the millions of times it may be used in the
> > pgoutput_row_filter function. My only doubts are in regard to how much
> > observable impact there would be re-evaluating the filter expression
> > just a few extra times by the get_rel_sync_entry function.
> >
>
> I think it depends but why in the first place do you want to allow
> re-evaluation when there is a way for not doing that?

Because the current code logic of having the "delayed" ExprState
evaluation does come at some cost. And the cost is -
a. Needing an extra condition and more code in the function pgoutput_row_filter
b. Needing to maintain the additional Node list

If we chose not to implement a delayed ExprState cache evaluation then
there would still be a (one-time) ExprState cache evaluation but it
would happen whenever get_rel_sync_entry is called (regardless of if
pgoputput_row_filter is subsequently called). E.g. there can be some
rebuilds of the ExprState cache if the user calls TRUNCATE.

I guess I felt the only justification for implementing more
sophisticated cache logic is if gives a performance gain. But if there
is no observable difference, then maybe it's better to just keep the
code simpler. That is why I have been questioning how much time a
one-time ExprState cache evaluation really takes, and would a few
extra ones even be noticeable.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Error code for checksum failure in origin.c

2021-08-26 Thread Amit Kapila
On Thu, Aug 26, 2021 at 3:18 PM Kasahara Tatsuhito
 wrote:
>
> Hi.
>
> In the code in src/backend/replication/logical/origin.c,
> the error code "ERRCODE_CONFIGURATION_LIMIT_EXCEEDED" is given
> when a checksum check results in an error,
> but "ERRCODE_ DATA_CORRUPTED" seems to be more appropriate.
>

+1. Your observation looks correct to me and the error code suggested
by you seems appropriate. We use ERRCODE_DATA_CORRUPTED in
ReadTwoPhaseFile() for similar error.

-- 
With Regards,
Amit Kapila.




verify_heapam for sequences?

2021-08-26 Thread Peter Eisentraut



Is there a reason why contrib/amcheck/verify_heapam.c does not want to 
run on sequences?  If I take out the checks, it appears to work.  Is 
this an oversight?  Or if there is a reason, maybe it could be stated in 
a comment, at least.





Error code for checksum failure in origin.c

2021-08-26 Thread Kasahara Tatsuhito
Hi.

In the code in src/backend/replication/logical/origin.c,
the error code "ERRCODE_CONFIGURATION_LIMIT_EXCEEDED" is given
when a checksum check results in an error,
but "ERRCODE_ DATA_CORRUPTED" seems to be more appropriate.


diff --git a/src/backend/replication/logical/origin.c
b/src/backend/replication/logical/origin.c
index 2c191de..65dcd03 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -796,7 +796,7 @@ StartupReplicationOrigin(void)
FIN_CRC32C(crc);
if (file_crc != crc)
ereport(PANIC,
-   (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+   (errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("replication slot checkpoint
has wrong checksum %u, expected %u",
crc, file_crc)));

Thought?

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: row filtering for logical replication

2021-08-26 Thread Peter Smith
On Wed, Aug 25, 2021 at 10:22 AM Euler Taveira  wrote:
>

>
> [By the way, it took some time to extract what you changed. Since we're 
> trading
> patches, I personally appreciate if you can send a patch on the top of the
> current one. I have some changes too and it is time consuming incorporating
> changes in the main patch.]
>

OK. Sorry for causing you trouble.

Here I am re-posting the ExprState cache changes as an incremental
patch on top of the last rebased row-filter patch (v23).

v25-0001 <--- v23 (last rebased main patch)
v25-0002 ExprState cache mods
v25-0002 ExprState cache extra debug logging (temp)

Hopefully, this will make it easier to deal with this change in isolation.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v25-0001-Row-filter-for-logical-replication.patch
Description: Binary data


v25-0002-ExprState-cache-modifications.patch
Description: Binary data


v25-0003-ExprState-cache-temp-extra-logging.patch
Description: Binary data


Re: cannot access to postgres-git via ssh

2021-08-26 Thread Magnus Hagander
On Thu, Aug 26, 2021 at 9:34 AM Kyotaro Horiguchi
 wrote:
>
> Hello.
>
> I'm facing a difficulty on cloning a repository via ssh.
>
> I understand that git repository can be accessed via http, git and ssh
> protocol, and ssh access uses the ssh public key registered in
> community account profile. I registered one in ecdsa-sha2-nistp256
> that I believe the server accepts.  I waited for more than 1 hour
> since key registration until the operation.

Hi!

ssh based access only works for repositories where you have explicit
permissions, it does not support anonymous access -- that has to be
over https (recommended) or git.

And specifically, the postgresql.git repo mirror only allows anonymous access.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Trap errors from streaming child in pg_basebackup to exit early

2021-08-26 Thread Daniel Gustafsson
When using pg_basebackup with WAL streaming (-X stream), we have observed on a
number of times in production that the streaming child exited prematurely (to
no fault of the code it seems, most likely due to network middleboxes), which
cause the backup to fail but only after it has run to completion.  On long
running backups this can consume a lot of time before it’s noticed.

By trapping the failure of the streaming process we can instead exit early to
allow the user to fix and/or restart the process.

The attached adds a SIGCHLD handler for Unix, and catch the returnvalue from
the Windows thread, in order to break out early from the main loop.  It still
needs a test, and proper testing on Windows, but early feedback on the approach
would be appreciated.

--
Daniel Gustafsson   https://vmware.com/



0001-Quick-exit-on-log-stream-child-exit-in-pg_basebackup.patch
Description: Binary data


Re: list of acknowledgments for PG14

2021-08-26 Thread Etsuro Fujita
On Thu, Aug 26, 2021 at 5:41 PM Peter Eisentraut
 wrote:
> Attached is the plain-text list of acknowledgments for the PG14 release
> notes, current through today.  Please check for problems such as wrong
> sorting, duplicate names in different variants, or names in the wrong
> order etc.  (Note that the current standard is given name followed by
> surname, independent of cultural origin.)

Thanks as usual!  I think these are Japanese names and in the wrong order:

Katsuragi Yuta
Kobayashi Hisanori
Kondo Yuta
Matsumura Ryo

Best regards,
Etsuro Fujita




Re: prevent immature WAL streaming

2021-08-26 Thread Kyotaro Horiguchi
At Thu, 26 Aug 2021 03:24:48 +, "Bossart, Nathan"  
wrote in 
> On 8/25/21, 5:40 PM, "Kyotaro Horiguchi"  wrote:
> > At Wed, 25 Aug 2021 18:18:59 +, "Bossart, Nathan"  
> > wrote in
> >> Let's say we have the following situation (F = flush, E = earliest
> >> registered boundary, and L = latest registered boundary), and let's
> >> assume that each segment has a cross-segment record that ends in the
> >> next segment.
> >>
> >> F E L
> >> |-|-|-|-|-|-|-|-|
> >>1 2 3 4 5 6 7 8
> >>
> >> Then, we write out WAL to disk and create .ready files as needed.  If
> >> we didn't flush beyond the latest registered boundary, the latest
> >> registered boundary now becomes the earliest boundary.
> >>
> >>   F E
> >> |-|-|-|-|-|-|-|-|
> >>1 2 3 4 5 6 7 8
> >>
> >> At this point, the earliest segment boundary past the flush point is
> >> before the "earliest" boundary we are tracking.
> >
> > We know we have some cross-segment records in the regin [E L] so we
> > cannot add a .ready file if flush is in the region. I haven't looked
> > the latest patch (or I may misunderstand the discussion here) but I
> > think we shouldn't move E before F exceeds previous (or in the first
> > picture above) L.  Things are done that way in my ancient proposal in
> > [1].
> 
> The strategy in place ensures that we track a boundary that doesn't
> change until the flush position passes it as well as the latest
> registered boundary.  I think it is important that any segment
> boundary tracking mechanism does at least those two things.  I don't
> see how we could do that if we didn't update E until F passed both E
> and L.

(Sorry, but I didn't get you clearly. So the discussion below might be
pointless.)

The ancient patch did:

If a flush didn't reach E, we can archive finished segments.

If a flush ends between E and L, we shouldn't archive finshed segments
at all. L can be moved further while this state, while E sits on the
same location while this state.

Once a flush passes L, we can archive all finished segments and can
erase both E and L.

A drawback of this strategy is that the region [E L] can contain gaps
(that is, segment boundaries that is not bonded by a continuation
record) and archive can be excessively retarded.  Perhaps if flush
goes behind write head by more than two segments, the probability of
creating the gaps would be higher.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: list of acknowledgments for PG14

2021-08-26 Thread Daniel Gustafsson
> On 26 Aug 2021, at 10:41, Peter Eisentraut 
>  wrote:

> Attached is the plain-text list of acknowledgments for the PG14 release 
> notes, current through today.  Please check for problems such as wrong 
> sorting, duplicate names in different variants, or names in the wrong order 
> etc.  (Note that the current standard is given name followed by surname, 
> independent of cultural origin.)

I would have expected “Ö” (Önder Kalacı) to sort after “Z” but that might only
be true for my locale?

--
Daniel Gustafsson   https://vmware.com/





list of acknowledgments for PG14

2021-08-26 Thread Peter Eisentraut


Attached is the plain-text list of acknowledgments for the PG14 release 
notes, current through today.  Please check for problems such as wrong 
sorting, duplicate names in different variants, or names in the wrong 
order etc.  (Note that the current standard is given name followed by 
surname, independent of cultural origin.)
Abhijit Menon-Sen
Ádám Balogh
Adrian Ho
Ahsan Hadi
Ajin Cherian
Aleksander Alekseev
Alessandro Gherardi
Alex Kozhemyakin
Alexander Korotkov
Alexander Lakhin
Alexander Pyhalov
Alexandra Wang
Alexey Bashtanov
Alexey Bulgakov
Alexey Kondratov
Álvaro Herrera
Amit Kapila
Amit Khandekar
Amit Langote
Amul Sul
Anastasia Lubennikova
Andreas Grob
Andreas Kretschmer
Andreas Seltenreich
Andreas Wicht
Andres Freund
Andrew Bille
Andrew Dunstan
Andrew Gierth
Andrey Borodin
Andrey Lepikhov
Andy Fan
Anton Voloshin
Antonin Houska
Arne Roland
Arseny Sher
Arthur Nascimento
Arthur Zakirov
Ashutosh Bapat
Ashutosh Sharma
Ashwin Agrawal
Asif Rehman
Asim Praveen
Atsushi Torikoshi
Aya Iwata
Barry Pederson
Bas Poot
Bauyrzhan Sakhariyev
Beena Emerson
Benoît Lobréau
Bernd Helmle
Bernhard M. Wiedemann
Bertrand Drouvot
Bharath Rupireddy
Boris Kolpackov
Brar Piening
Brian Ye
Bruce Momjian
Bryn Llewellyn
Cameron Daniel
Chapman Flack
Charles Samborski
Charlie Hornsby
Chen Jiaoqian
Chris Wilson
Christoph Berg
Christophe Courtois
Corey Huinker
Craig Ringer
Dagfinn Ilmari Mannsåker
Dana Burd
Daniel Cherniy
Daniel Gustafsson
Daniel Vérité
Daniel Westermann
Daniele Varrazzo
Dar Alathar-Yemen
Darafei Praliaskouski
Dave Cramer
David Christensen
David Fetter
David G. Johnston
David Geier
David Gilman
David Pirotte
David Rowley
David Steele
David Turon
David Zhang
Dean Rasheed
Denis Patron
Dian Fay
Dilip Kumar
Dimitri Nüscheler
Dmitriy Kuzmin
Dmitry Dolgov
Dmitry Marakasov
Domagoj Smoljanovic
Dong Wook
Douglas Doole
Duncan Sands
Edmund Horner
Edson Richter
Ekaterina Kiryanova
Elena Indrupskaya
Emil Iggland
Emre Hasegeli
Eric Thinnes
Erik Rijkers
Erwin Brandstetter
Etienne Stalmans
Etsuro Fujita
Eugen Konkov
Euler Taveira
Fabien Coelho
Fabrízio de Royes Mello
Federico Caselli
Felix Lechner
Floris Van Nee
Frank Gagnepain
Frits Jalvingh
Georgios Kokolatos
Greg Nancarrow
Greg Rychlewski
Greg Sabino Mullane
Gregory Smith
Grigory Smolkin
Guillaume Lelarge
Guy Burgess
Guyren Howe
Haiying Tang
Hamid Akhtar
Hans Buschmann
Hao Wu
Haribabu Kommi
Harisai Hari
Hayato Kuroda
Heath Lord
Heikki Linnakangas
Henry Hinze
Himanshu Upadhyaya
Hironobu Suzuki
Hiroshi Inoue
Honza Horak
Hou Zhijie
Hubert Lubaczewski
Hubert Zhang
Ian Barwick
Ibrar Ahmed
Ildus Kurbangaliev
Isaac Morland
Israel Barth
Itamar Gafni
Jacob Champion
Jaime Casanova
Jaime Soler
Jakub Wartak
James Coleman
James Hilliard
James Hunter
James Inform
Jan Mussler
Japin Li
Jasen Betts
Jason Harvey
Jason Kim
Jeevan Ladhe
Jeff Davis
Jeff Janes
Jelte Fennema
Jeremy Evans
Jeremy Finzel
Jeremy Smith
Jesse Kinkead
Jesse Zhang
Jie Zhang
Jim Doty
Jim Nasby
Jimmy Angelakos
Jimmy Yih
Joe Conway
Joel Jacobson
John Naylor
John Thompson
Jonathan Katz
Josef Šimánek
Joseph Nahmias
Josh Berkus
Juan José Santamaría Flecha
Julien Rouhaud
Junfeng Yang
Jürgen Purtz
Justin Pryzby
Katsuragi Yuta
Kazutaka Onishi
Keisuke Kuroda
Kelly Min
Kensuke Okamura
Kevin Sweet
Kevin Yeap
Kirk Jamison
Kobayashi Hisanori
Kohei KaiGai
Kondo Yuta
Konstantin Knizhnik
Kota Miyake
Krzysztof Gradek
Kuntal Ghosh
Kyle Kingsbury
Kyotaro Horiguchi
Laurent Hasson
Laurenz Albe
Lee Dong Wook
Li Japin
Liu Huailing
Luc Vlaming
Ludovic Kuty
Luis Roberto
Lukas Eder
Maciek Sakrejda
Madan Kumar
Magnus Hagander
Mahendra Singh Thalor
Maksim Milyutin
Marc Boeren
Marcin Krupowicz
Marco Atzeri
Marek Szuba
Marina Polyakova
Mark Dilger
Mark Wong
Mark Zhao
Markus Wanner
Martín Marqués
Martin Visser
Masahiko Sawada
Masahiro Ikeda
Masao Fujii
Mathis Rudolf
Matsumura Ryo
Matthias van de Meent
Matthieu Garrigues
Matthijs van der Vleuten
Maxim Orlov
Melanie Plageman
Merlin Moncure
Michael Banck
Michael Brown
Michael Meskes
Michael Paquier
Michael Paul Killian
Michael Vastola
Michail Nikolaev
Michał Albrycht
Mikael Gustavsson
Movead Li
Muhammad Usama
Nagaraj Raj
Naoki Nakamichi
Nathan Bossart
Nazli Ugur Koyluoglu
Neha Sharma
Neil Chen
Nick Cleaton
Nico Williams
Nikhil Benesch
Nikhil Sontakke
Nikita Glukhov
Nikita Konev
Nikolay Samokhvalov
Nikolay Shaplov
Nitin Jadhav
Noah Misch
Noriyoshi Shinoda
Odin Ugedal
Oleg Bartunov
Oleg Samoilov
Önder Kalacı
Pascal Legrand
Paul Förster
Paul Guo
Paul Jungwirth
Paul Martinez
Paul Sivash
Pavan Deolasee
Pavel Boev
Pavel Borisov
Pavel Luzanov
Pavel Stehule
Pengcheng Liu
Peter Eisentraut
Peter Geoghegan
Peter Smith
Peter Vandivier
Petr Fedorov
Petr Jelínek
Philipp Gramzow
Philippe Beaudoin
Phillip Menke
Pierre Giraud
Quan Zongliang
Rafi Shamim
Rahila Syed
Rajkumar Raghuwanshi
Ranier Vilela
Regina Obe
Rémi Lapeyre
Robert Foggia
Robert Grange
Robert Haas
Robert Kahlert
Robert Sosinski
Robert Treat
Robin Abbi
Robins Tharakan
Roger Mason
Rohit Bhogate
Roman Zharkov
Ron L. Johnson
Ronan Dunklau
Ryan Lambert
Saeed Hubaishan

  1   2   >