Re: Direct I/O

2023-01-24 Thread Bharath Rupireddy
On Thu, Dec 22, 2022 at 7:34 AM Thomas Munro  wrote:
>
> On Wed, Dec 14, 2022 at 5:48 PM Thomas Munro  wrote:
> > 0001 -- David's palloc_aligned() patch 
> > https://commitfest.postgresql.org/41/3999/
> > 0002 -- I/O-align almost all buffers used for I/O
> > 0003 -- Add the GUCs
> > 0004 -- Throwaway hack to make cfbot turn the GUCs on
>
> David pushed the first as commit 439f6175, so here is a rebase of the
> rest.  I also fixed a couple of thinkos in the handling of systems
> where we don't know how to do direct I/O.  In one place I had #ifdef
> PG_O_DIRECT, but that's always defined, it's just that it's 0 on
> Solaris and OpenBSD, and the check to reject the GUC wasn't quite
> right on such systems.

Thanks. I have some comments on
v3-0002-Add-io_direct-setting-developer-only.patch:

1. I think we don't need to overwrite the io_direct_string in
check_io_direct so that show_io_direct can be avoided.
2. check_io_direct can leak the flags memory - when io_direct is not
supported or for an invalid list syntax or an invalid option is
specified.

I have addressed my review comments as a delta patch on top of v3-0002
and added it here as v1-0001-Review-comments-io_direct-GUC.txt.

Some comments on the tests added:

1. Is there a way to know if Direct IO for WAL and data has been
picked up programmatically? IOW, can we know if the OS page cache is
bypassed? I know an external extension pgfincore which can help here,
but nothing in the core exists AFAICS.
+is('1', $node->safe_psql('postgres', 'select count(*) from t1'),
"read back from shared");
+is('1', $node->safe_psql('postgres', 'select * from t2count'),
"read back from local");
+$node->stop('immediate');

2. Can we combine these two append_conf to a single statement?
+$node->append_conf('io_direct', 'data,wal,wal_init');
+$node->append_conf('shared_buffers', '64kB'); # tiny to force I/O

3. A nitpick: Can we split these queries multi-line instead of in a single line?
+$node->safe_psql('postgres', 'begin; create temporary table t2 as
select 1 as i from generate_series(1, 1); update t2 set i = i;
insert into t2count select count(*) from t2; commit;');

4. I don't think we need to stop the node before the test ends, no?
+$node->stop;
+
+done_testing();

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b3eed3d6fc849b9e16fbace1f37d401424f81ab0 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 25 Jan 2023 07:18:11 +
Subject: [PATCH v1] Review comments io_direct GUC

---
 src/backend/storage/file/fd.c   | 57 -
 src/backend/utils/misc/guc_tables.c |  4 +-
 src/include/utils/guc_hooks.h   |  1 -
 3 files changed, 25 insertions(+), 37 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index eb83de4fb9..329acc2ffd 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3749,7 +3749,7 @@ data_sync_elevel(int elevel)
 bool
 check_io_direct(char **newval, void **extra, GucSource source)
 {
-   int*flags = guc_malloc(ERROR, sizeof(*flags));
+   int flags;
 
 #if PG_O_DIRECT == 0
if (strcmp(*newval, "") != 0)
@@ -3757,38 +3757,51 @@ check_io_direct(char **newval, void **extra, GucSource 
source)
GUC_check_errdetail("io_direct is not supported on this 
platform.");
return false;
}
-   *flags = 0;
+   flags = 0;
 #else
-   List   *list;
+   List   *elemlist;
ListCell   *l;
+   char   *rawstring;
 
-   if (!SplitGUCList(*newval, ',', ))
+   /* Need a modifiable copy of string */
+   rawstring = pstrdup(*newval);
+
+   if (!SplitGUCList(rawstring, ',', ))
{
GUC_check_errdetail("invalid list syntax in parameter \"%s\"",
"io_direct");
+   pfree(rawstring);
+   list_free(elemlist);
return false;
}
 
-   *flags = 0;
-   foreach (l, list)
+   flags = 0;
+   foreach (l, elemlist)
{
char   *item = (char *) lfirst(l);
 
if (pg_strcasecmp(item, "data") == 0)
-   *flags |= IO_DIRECT_DATA;
+   flags |= IO_DIRECT_DATA;
else if (pg_strcasecmp(item, "wal") == 0)
-   *flags |= IO_DIRECT_WAL;
+   flags |= IO_DIRECT_WAL;
else if (pg_strcasecmp(item, "wal_init") == 0)
-   *flags |= IO_DIRECT_WAL_INIT;
+   flags |= IO_DIRECT_WAL_INIT;
else
{
GUC_check_errdetail("invalid option \"%s\"", item);
+   pfree(rawstring);
+   list_free(elemlist);
return false;
}
}
+
+   

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-01-24 Thread Kyotaro Horiguchi
At Tue, 24 Jan 2023 14:35:12 -0800, Andres Freund  wrote in 
> > 0002:
> > 
> > +   Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
> > 
> > This is relatively complex checking. We already asserts-out increments
> > of invalid counters. Thus this is checking if some unrelated codes
> > clobbered them, which we do only when consistency is critical. Is
> > there any needs to do that here?  I saw another occurance of the same
> > assertion.
> 
> I found it useful to find problems.

Okay.

> > +   no_temp_rel = bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
> > +   bktype == B_CHECKPOINTER || bktype == B_AUTOVAC_WORKER ||
> > +   bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP;
> > 
> > I'm not sure I like to omit parentheses for such a long Boolean
> > expression on the right side.
> 
> What parens would help?

I thought about the following.

no_temp_rel =
   (bktype == B_AUTOVAC_LAUNCHER ||
bktype == B_BG_WRITER ||
bktype == B_CHECKPOINTER ||
bktype == B_AUTOVAC_WORKER ||
bktype == B_STANDALONE_BACKEND ||
bktype == B_STARTUP);


> > +   write_chunk_s(fpout, );
> > +   if (!read_chunk_s(fpin, >io.stats))
> > 
> > The names of the functions hardly make sense alone to me. How about
> > write_struct()/read_struct()?  (I personally prefer to use
> > write_chunk() directly..)
> 
> That's not related to this patch - there's several existing callers for
> it. And write_struct wouldn't be better imo, because it's not just for
> structs.

Hmm.  Then what the "_s" stands for?

> > + PgStat_BktypeIO
> > 
> > This patch abbreviates "backend" as "bk" but "be" is used in many
> > places. I think that naming should follow the predecessors.
> 
> The precedence aren't consistent unfortunately :)

Uuuum.  Okay, just I like "be" there!  Anyway, I don't strongly
push that.

> > > +Number of read operations in units of 
> > > op_bytes.
> > 
> > I may be the only one who see the name as umbiguous between "total
> > number of handled bytes" and "bytes hadled at an operation". Can't it
> > be op_blocksize or just block_size?
> > 
> > +   b.io_object,
> > +   b.io_context,
> 
> No, block wouldn't be helpful - we'd like to use this for something that isn't
> uniform blocks.

What does the field show in that case?  The mean of operation size? Or
one row per opration size?  If the former, the name looks somewhat
wrong. If the latter, block_size seems making sense.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: drop postmaster symlink

2023-01-24 Thread Peter Eisentraut

On 12.01.23 20:11, Devrim Gündüz wrote:

On Thu, 2023-01-12 at 13:35 -0500, Joe Conway wrote:

To be clear, I am completely in agreement with you about removing the
symlink. I just wanted to be sure Devrim was alerted because I knew
he  had a strong opinion on this topic ;-)


Red Hat's own packages, thus their users may be unhappy about that,
too. They also call postmaster directly.


Devrim,

Apart from your concerns, it appears there is consensus for making this 
change.  The RPM packaging scripts can obviously be fixed easily for 
this.  Do you have an objection to making this change?






Re: Record queryid when auto_explain.log_verbose is on

2023-01-24 Thread Michael Paquier
On Tue, Jan 24, 2023 at 11:01:46PM +0900, torikoshia wrote:
> On 2023-01-23 09:35, Michael Paquier wrote:
>> ExplainPrintTriggers() is kind of different because there is
>> auto_explain_log_triggers.  Still, we could add a flag in ExplainState
>> deciding if the triggers should be printed, so as it would be possible
>> to move ExplainPrintTriggers() and ExplainPrintJITSummary() within
>> ExplainPrintPlan(), as well?  The same kind of logic could be applied
>> for the planning time and the buffer usage if these are tracked in
>> ExplainState rather than being explicit arguments of ExplainOnePlan().
>> Not to mention that this reduces the differences between
>> ExplainOneUtility() and ExplainOnePlan().
> 
> Hmm, this refactoring would worth considering, but should be done in another
> patch?

It could be.  That's fine by me to not do that as a first step as the
query ID computation is done just after ExplainPrintPlan().  An
argument could be made about ExplainPrintPlan(), though
compute_query_id = regress offers an option to control that, as well.

>> Leaving this comment aside, I think that this should have at least one
>> regression test in 001_auto_explain.pl, where query_log() can be
>> called while the verbose GUC of auto_explain is enabled.
> 
> Agreed.
> Added a test for queryid logging.

Thanks.  Will check and probably apply on HEAD.
--
Michael


signature.asc
Description: PGP signature


Re: recovery modules

2023-01-24 Thread Michael Paquier
On Mon, Jan 23, 2023 at 01:44:28PM -0800, Nathan Bossart wrote:
> On Mon, Jan 23, 2023 at 11:44:57AM +0900, Michael Paquier wrote:
> I updated the documentation as you suggested, and I renamed basic_archive
> to basic_wal_module.

Thanks.  The renaming of basic_archive to basic_wal_module looked
fine, so applied.

While looking at the docs, I found a bit confusing that the main
section of the WAL modules included the full description for the
archive modules, so I have moved that into the sect2 dedicated to the
archive modules instead, as of the attached.  0004 has been updated in
consequence, with details about the recovery bits within its own
sect2.

>>  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> - errmsg("must specify restore_command when standby mode is not 
>> enabled")));
>> + errmsg("must specify restore_command or a restore_library that defines 
>> "
>> +"a restore callback when standby mode is not enabled")));
>> This is long.  Shouldn't this be split into an errdetail() to list all
>> the options at hand?
> 
> Should the errmsg() be something like "recovery parameters misconfigured"?

Hmm.  Here is an idea:
- errmsg: "must specify restore option when standby mode is not enabled"
- errdetail: "Either restore_command or restore_library need to be
specified."

>> -   if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
>> -   ereport(ERROR,
>> -   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> -errmsg("both archive_command and archive_library set"),
>> -errdetail("Only one of archive_command, archive_library may 
>> be set.")));
>> +   CheckMutuallyExclusiveGUCs(XLogArchiveLibrary, "archive_library",
>> +  XLogArchiveCommand, "archive_command");
>> 
>> The introduction of this routine could be a patch on its own, as it
>> impacts the archiving path.
> 
> I moved this to a separate patch.

While pondering about that, I found a bit sad that this only works for
string GUCs, while it could be possible to do the same kind of checks
for the other GUC types with a more generic routine?  Not enum,
obviously, but int, float, bool and real, with the a failure if both
GUCs are set to non-default values?  Also, and I may be missing
something here, do we really need to pass the value of the parameters
to check?  Wouldn't it be enough to check for the case where both
parameters are set to their non-default values after reloading?

>> - Startup reloading, as of StartupRereadConfig().  This code could
>> involve a WAL receiver restart depending on a change in the slot
>> change or in primary_conninfo, and force at the same time an ERROR
>> because of conflicting recovery library and command configuration.
>> This one should be safe because pendingWalRcvRestart would just be
>> considered later on by the startup process itself while waiting for
>> WAL to become available.  Still this could deserve a comment?  Even if
>> there is a misconfiguration, a reload on a standby would enforce a
>> FATAL in the startup process, taking down the whole server.
> 
> Do you think the parameter checks should go before the WAL receiver restart
> logic?

Yeah, switching the order makes the logic more robust IMO.

>> - Checkpointer initialization, as of CheckpointerMain().  A
>> configuration failure in this code path, aka server startup, causes
>> the server to loop infinitely on FATAL with the misconfiguration
>> showing up all the time..  This is a problem.
> 
> Perhaps this is a reason to move the parameter check in CheckpointerMain()
> to after the sigsetjmp() block.  This should avoid full server restarts.
> Only the checkpointer process would loop with the ERROR.

The loop part is annoying..  I've never been a fan of adding this
cross-value checks for the archiver modules in the first place, and it
would make things much simpler in the checkpointer if we need to think
about that as we want these values to be reloadable.  Perhaps this
could just be an exception where we just give priority on one over the
other archive_cleanup_command?  The startup process has a well-defined
sequence after a failure, while the checkpointer is designed to remain
robust.

>> - Last comes the checkpointer GUC reloading, as of
>> HandleCheckpointerInterrupts(), with a second problem.  This
>> introduces a failure path where ConfigReloadPending is processed at
>> the same time as ShutdownRequestPending based on the way it is coded,
>> interacting with what would be a normal shutdown in some cases?  And
>> actually, if you enforce a misconfiguration on reload, the
>> checkpointer reports an error but it does not enforce a process
>> restart, hence it keeps around the new, incorrect, configuration while
>> waiting for a new checkpoint to happen once restore_library and
>> archive_cleanup_command are set.  This could lead to surprises, IMO.
>> Upgrading to a FATAL in this code path triggers an infinite loop, like
>> the startup path.
> 
> 

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-24 Thread Kyotaro Horiguchi
At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart  
wrote in 
> On Tue, Jan 24, 2023 at 01:13:55PM -0500, Tom Lane wrote:
> > Either that comment needs to be rewritten or we need to invent some
> > more macros.
> 
> Here is a first attempt at a patch.  I scanned through all the existing
> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
> else that needed adjusting.

There seems to be two cases for DSA_HANDLE_INVALID in dsa_get_handle
and dsa_attach_in_place, one of which is Assert(), though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Amit Kapila
On Wed, Jan 25, 2023 at 11:57 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 24 Jan 2023 12:19:04 +, "Takamichi Osumi (Fujitsu)" 
>  wrote in
> > Attached the patch v20 that has incorporated all comments so far.
>
...
>
>
> +  in which case no additional wait is necessary. If the system clocks
> +  on publisher and subscriber are not synchronized, this may lead to
> +  apply changes earlier than expected, but this is not a major issue
> +  because this parameter is typically much larger than the time
> +  deviations between servers. Note that if this parameter is set to a
>
> This doesn't seem to fit our documentation. It is not our business
> whether a certain amount deviation is critical or not. How about
> somethig like the following?
>

But we have a similar description for 'recovery_min_apply_delay' [1].
See "...If the system clocks on primary and standby are not
synchronized, this may lead to recovery applying records earlier than
expected; but that is not a major issue because useful settings of
this parameter are much larger than typical time deviations between
servers."

[1] - https://www.postgresql.org/docs/devel/runtime-config-replication.html

-- 
With Regards,
Amit Kapila.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Amit Kapila
On Wed, Jan 25, 2023 at 11:23 AM Takamichi Osumi (Fujitsu)
 wrote:
>
>
> Thank you for checking the patch !
> On Wednesday, January 25, 2023 10:17 AM Kyotaro Horiguchi 
>  wrote:
> > In short, I'd like to propose renaming the parameter in_delayed_apply of
> > send_feedback to "has_unprocessed_change".
> >
> > At Tue, 24 Jan 2023 12:27:58 +0530, Amit Kapila 
> > wrote in
> > > > send_feedback():
> > > > +* If the subscriber side apply is delayed (because of
> > time-delayed
> > > > +* replication) then do not tell the publisher that the received
> > latest
> > > > +* LSN is already applied and flushed, otherwise, it leads to 
> > > > the
> > > > +* publisher side making a wrong assumption of logical
> > replication
> > > > +* progress. Instead, we just send a feedback message to avoid a
> > publisher
> > > > +* timeout during the delay.
> > > >  */
> > > > -   if (!have_pending_txes)
> > > > +   if (!have_pending_txes && !in_delayed_apply)
> > > > flushpos = writepos = recvpos;
> > > >
> > > > Honestly I don't like this wart. The reason for this is the function
> > > > assumes recvpos = applypos but we actually call it while holding
> > > > unapplied changes, that is, applypos < recvpos.
> > > >
> > > > Couldn't we maintain an additional static variable "last_applied"
> > > > along with last_received?
> > > >
> > >
> > > It won't be easy to maintain the meaning of last_applied because there
> > > are cases where we don't apply the change directly. For example, in
> > > case of streaming xacts, we will just keep writing it to the file,
> > > now, say, due to some reason, we have to send the feedback, then it
> > > will not allow you to update the latest write locations. This would
> > > then become different then what we are doing without the patch.
> > > Another point to think about is that we also need to keep the variable
> > > updated for keep-alive ('k') messages even though we don't apply
> > > anything in that case. Still, other cases to consider are where we
> > > have mix of streaming and non-streaming transactions.
> >
> > Yeah.  Even though I named it as "last_applied", its objective is to have
> > get_flush_position returning the correct have_pending_txes without a hint
> > from callers, that is, "let g_f_position know if store_flush_position has 
> > been
> > called with the last received data".
> >
> > Anyway I tried that but didn't find a clean and simple way. However, while 
> > on it,
> > I realized what the code made me confused.
> >
> > +static void send_feedback(XLogRecPtr recvpos, bool force, bool
> > requestReply,
> > +   bool in_delayed_apply);
> >
> > The name "in_delayed_apply" doesn't donsn't give me an idea of what the
> > function should do for it. If it is named "has_unprocessed_change", I think 
> > it
> > makes sense that send_feedback should think there may be an outstanding
> > transaction that is not known to the function.
> >
> >
> > So, my conclusion here is I'd like to propose changing the parameter name to
> > "has_unapplied_change".
> Renamed the variable name to "has_unprocessed_change".
> Also, removed the first argument of the send_feedback() which isn't necessary 
> now.
>

Why did you remove the first argument of the send_feedback() when that
is not added by this patch? If you really think that is an
improvement, feel free to propose that as a separate patch.
Personally, I don't see a value in it.

-- 
With Regards,
Amit Kapila.




RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-24 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> 
> I have updated the patch accordingly and it looks good to me. I'll
> push this first patch early next week (Monday) unless there are more
> comments.

Thanks for updating. I checked v88-0001 and I have no objection. LGTM.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: document the need to analyze partitioned tables

2023-01-24 Thread Laurenz Albe
On Wed, 2023-01-25 at 16:26 +1300, David Rowley wrote:
> On Wed, 18 Jan 2023 at 22:15, Laurenz Albe  wrote:
> > Attached is a new version of my patch that tries to improve the wording.
> 
> I had a look at this and agree that we should adjust the paragraph in
> question if people are finding it confusing.
> 
> I started to adjust that but since the text is fairly short it turned
> out quite different from what you had.
> 
> I ended up with:
> 
> +    With partitioned tables, since these do not directly store tuples, these
> +    do not require autovacuum to perform any VACUUM
> +    operations.  Autovacuum simply performs a VACUUM on 
> the
> +    partitioned table's partitions the same as it does with normal tables.
> +    However, the same is true for ANALYZE operations, and
> +    this can be problematic as there are various places in the query planner
> +    that attempt to make use of table statistics for partitioned tables when
> +    partitioned tables are queried.  For now, you can work around this 
> problem
> +    by running a manual ANALYZE command on the partitioned
> +    table when the partitioned table is first populated, and again whenever
> +    the distribution of data in its partitions changes significantly.
> 
> which I've also attached in patch form.
> 
> I know there's been a bit of debate on the wording and a few patches,
> so I may not be helping.  If nobody is against the above, then I don't
> mind going ahead with it and backpatching to whichever version this
> first applies to. I just felt I wasn't 100% happy with what was being
> proposed.

Thanks, your help is welcome.

Did you see Justin's wording suggestion in
https://postgr.es/m/20230118174919.GA9837%40telsasoft.com ?
He didn't attach it as a patch, so you may have missed it.
I was pretty happy with that.

I think your first sentence it a bit clumsy and might be streamlined to

  Partitioned tables do not directly store tuples and consequently do not
  require autovacuum to perform any VACUUM operations.

Also, I am a little bit unhappy about

1. Your paragraph states that partitioned table need no autovacuum,
   but doesn't state unmistakably that they will never be treated
   by autovacuum.

2. You make a distinction between table partitions and "normal tables",
   but really there is no distiction.

Perhaps I am being needlessly picky here...

Yours,
Laurenz Albe




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-24 Thread Amit Kapila
On Wed, Jan 25, 2023 at 10:05 AM Amit Kapila  wrote:
>
> On Wed, Jan 25, 2023 at 3:15 AM Peter Smith  wrote:
> >
> > 1.
> > @@ -210,7 +210,7 @@ int logical_decoding_work_mem;
> >  static const Size max_changes_in_memory = 4096; /* XXX for restore only */
> >
> >  /* GUC variable */
> > -int logical_decoding_mode = LOGICAL_DECODING_MODE_BUFFERED;
> > +int logical_replication_mode = LOGICAL_REP_MODE_BUFFERED;
> >
> >
> > I noticed that the comment /* GUC variable */ is currently only above
> > the logical_replication_mode, but actually logical_decoding_work_mem
> > is a GUC variable too. Maybe this should be rearranged somehow then
> > change the comment "GUC variable" -> "GUC variables"?
> >
>
> I think moving these variables together doesn't sound like a good idea
> because logical_decoding_work_mem variable is defined with other
> related variable. Also, if we are doing the last comment, I think that
> will obviate the need for this.
>
> > ==
> > src/backend/utils/misc/guc_tables.c
> >
> > @@ -4908,13 +4908,13 @@ struct config_enum ConfigureNamesEnum[] =
> >   },
> >
> >   {
> > - {"logical_decoding_mode", PGC_USERSET, DEVELOPER_OPTIONS,
> > + {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS,
> >   gettext_noop("Allows streaming or serializing each change in logical
> > decoding."),
> >   NULL,
> >   GUC_NOT_IN_SAMPLE
> >   },
> > - _decoding_mode,
> > - LOGICAL_DECODING_MODE_BUFFERED, logical_decoding_mode_options,
> > + _replication_mode,
> > + LOGICAL_REP_MODE_BUFFERED, logical_replication_mode_options,
> >   NULL, NULL, NULL
> >   },
> >
> > That gettext_noop string seems incorrect. I think Kuroda-san
> > previously reported the same, but then you replied it has been fixed
> > already [1]
> >
> > > I felt the description seems not to be suitable for current behavior.
> > > A short description should be like "Sets a behavior of logical 
> > > replication", and
> > > further descriptions can be added in lond description.
> > I adjusted the description here.
> >
> > But this doesn't look fixed to me. (??)
> >
>
> Okay, so, how about the following for the 0001 patch:
> short desc: Controls when to replicate each change.
> long desc: On the publisher, it allows streaming or serializing each
> change in logical decoding.
>

I have updated the patch accordingly and it looks good to me. I'll
push this first patch early next week (Monday) unless there are more
comments.

-- 
With Regards,
Amit Kapila.


v88-0001-Rename-GUC-logical_decoding_mode-to-logical_repl.patch
Description: Binary data


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Kyotaro Horiguchi
At Tue, 24 Jan 2023 12:19:04 +, "Takamichi Osumi (Fujitsu)" 
 wrote in 
> Attached the patch v20 that has incorporated all comments so far.

Thanks! I looked thourgh the documentation part.


+  
+   subminapplydelay int8
+  
+  
+   Total time spent delaying the application of changes, in milliseconds.
+  

I was confused becase it reads as this column shows the summarized
actual waiting time caused by min_apply_delay.  IIUC actually it shows
the min_apply_delay setting for the subscription. Thus shouldn't it be
something like this?

"The minimum amount of time to delay applying changes, in milliseconds"

And it might be better to mention the corresponding subscription paramter.


+   error. If wal_receiver_status_interval is set to
+   zero, the apply worker doesn't send any feedback messages during the
+   min_apply_delay period.

I took a bit longer time to understand what this sentence means.  I'd
like to suggest something like the follwoing.

"Since no status-update messages are sent while delaying, note that
wal_receiver_status_interval is the only source of keepalive messages
during that period."

+  
+   A logical replication subscription can delay the application of changes by
+   specifying the min_apply_delay subscription parameter.
+   See  for details.
+  

I'm not sure "logical replication subscription" is a common term.
Doesn't just "subscription" mean the same, especially in that context?
(Note that 31.2 starts with "A subscription is the downstream..").


+  Any delay occurs only on WAL records for transaction begins after all
+  initial table synchronization has finished. The delay is calculated

There is no "transaction begin" WAL records.  Maybe it is "logical
replication transaction begin message". The timestamp is of "commit
time".  (I took "transaction begins" as a noun, but that might be
wrong..)


+may reduce the actual wait time. It is also possible that the overhead
+already exceeds the requested min_apply_delay value,
+in which case no additional wait is necessary. If the system clocks

I'm not sure it is right to say "necessary" here.  IMHO it might be
better be "in which case no delay is applied".


+  in which case no additional wait is necessary. If the system clocks
+  on publisher and subscriber are not synchronized, this may lead to
+  apply changes earlier than expected, but this is not a major issue
+  because this parameter is typically much larger than the time
+  deviations between servers. Note that if this parameter is set to a

This doesn't seem to fit our documentation. It is not our business
whether a certain amount deviation is critical or not. How about
somethig like the following?

"Note that the delay is measured between the timestamp assigned by
publisher and the system clock on subscriber.  You need to manage the
system clocks to be in sync so that the delay works properly."

+Delaying the replication can mean there is a much longer time
+between making a change on the publisher, and that change being
+committed on the subscriber. This can impact the performance of
+synchronous replication. See 
+parameter.

Do we need the "can" in "Delaying the replication can mean"?  If we
want to say, it might be "Delaying the replication means there can be
a much longer..."?


+  
+   Create a subscription to a remote server that replicates tables in
+   the mypub publication and starts replicating immediately
+   on commit. Pre-existing data is not copied. The application of changes is
+   delayed by 4 hours.
+
+CREATE SUBSCRIPTION mysub
+ CONNECTION 'host=192.0.2.4 port=5432 user=foo dbname=foodb'
+PUBLICATION mypub
+   WITH (copy_data = false, min_apply_delay = '4h');
+

I'm not sure we need this additional example. We already have two
exmaples one of which differs from the above only by actual values for
PUBLICATION and WITH clauses.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Takamichi Osumi (Fujitsu)
Hi, Horiguchi-san


Thank you for checking the patch !
On Wednesday, January 25, 2023 10:17 AM Kyotaro Horiguchi 
 wrote:
> In short, I'd like to propose renaming the parameter in_delayed_apply of
> send_feedback to "has_unprocessed_change".
> 
> At Tue, 24 Jan 2023 12:27:58 +0530, Amit Kapila 
> wrote in
> > > send_feedback():
> > > +* If the subscriber side apply is delayed (because of
> time-delayed
> > > +* replication) then do not tell the publisher that the received
> latest
> > > +* LSN is already applied and flushed, otherwise, it leads to the
> > > +* publisher side making a wrong assumption of logical
> replication
> > > +* progress. Instead, we just send a feedback message to avoid a
> publisher
> > > +* timeout during the delay.
> > >  */
> > > -   if (!have_pending_txes)
> > > +   if (!have_pending_txes && !in_delayed_apply)
> > > flushpos = writepos = recvpos;
> > >
> > > Honestly I don't like this wart. The reason for this is the function
> > > assumes recvpos = applypos but we actually call it while holding
> > > unapplied changes, that is, applypos < recvpos.
> > >
> > > Couldn't we maintain an additional static variable "last_applied"
> > > along with last_received?
> > >
> >
> > It won't be easy to maintain the meaning of last_applied because there
> > are cases where we don't apply the change directly. For example, in
> > case of streaming xacts, we will just keep writing it to the file,
> > now, say, due to some reason, we have to send the feedback, then it
> > will not allow you to update the latest write locations. This would
> > then become different then what we are doing without the patch.
> > Another point to think about is that we also need to keep the variable
> > updated for keep-alive ('k') messages even though we don't apply
> > anything in that case. Still, other cases to consider are where we
> > have mix of streaming and non-streaming transactions.
> 
> Yeah.  Even though I named it as "last_applied", its objective is to have
> get_flush_position returning the correct have_pending_txes without a hint
> from callers, that is, "let g_f_position know if store_flush_position has been
> called with the last received data".
> 
> Anyway I tried that but didn't find a clean and simple way. However, while on 
> it,
> I realized what the code made me confused.
> 
> +static void send_feedback(XLogRecPtr recvpos, bool force, bool
> requestReply,
> +   bool in_delayed_apply);
> 
> The name "in_delayed_apply" doesn't donsn't give me an idea of what the
> function should do for it. If it is named "has_unprocessed_change", I think it
> makes sense that send_feedback should think there may be an outstanding
> transaction that is not known to the function.
> 
> 
> So, my conclusion here is I'd like to propose changing the parameter name to
> "has_unapplied_change".
Renamed the variable name to "has_unprocessed_change".
Also, removed the first argument of the send_feedback() which isn't necessary 
now.
Kindly have a look at the patch shared in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373193B4331B7EB6276F682EDCE9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi





RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Takamichi Osumi (Fujitsu)
Hi, 

On Wednesday, January 25, 2023 2:02 PM shveta malik  
wrote:
> On Tue, Jan 24, 2023 at 5:49 PM Takamichi Osumi (Fujitsu)
>  wrote:
> >
> >
> > Attached the patch v20 that has incorporated all comments so far.
> > Kindly have a look at the attached patch.
> Thank You for patch. My previous comments are addressed. Tested it and it
> looks good. Logging is also fine now.
> 
> Just one comment, in summary, we see :
> If the subscription sets min_apply_delay parameter, the logical replication
> worker will delay the transaction commit for min_apply_delay milliseconds.
> 
> Is it better to write "delay the transaction apply" instead of "delay the
> transaction commit" just to be consistent as we do not actually delay the
> commit for regular transactions.
Thank you for your review !

Agreed. Your description looks better.
Attached the updated patch v21.


Best Regards,
Takamichi Osumi



v21-0001-Time-delayed-logical-replication-subscriber.patch
Description:  v21-0001-Time-delayed-logical-replication-subscriber.patch


Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Michael Paquier
On Wed, Jan 25, 2023 at 10:32:10AM +0900, Michael Paquier wrote:
> Thanks, my memory was fuzzy regarding that.  I am curious if the error
> in the recovery tests will persist with that set up.  The next run
> will be in a few hours, so let's see..

So it looks like tanaget is able to reproduce the failure of this
thread much more frequently than the other animals:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tanager=2023-01-25%2003%3A05%3A05

That's interesting.  FWIW, this environment is just a Raspberry PI 4
with 8GB of memory with clang.
--
Michael


signature.asc
Description: PGP signature


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread shveta malik
On Tue, Jan 24, 2023 at 5:49 PM Takamichi Osumi (Fujitsu)
 wrote:
>
>
> Attached the patch v20 that has incorporated all comments so far.
> Kindly have a look at the attached patch.
>
>
> Best Regards,
> Takamichi Osumi
>

Thank You for patch. My previous comments are addressed. Tested it and
it looks good. Logging is also fine now.

Just one comment, in summary, we see :
If the subscription sets min_apply_delay parameter, the logical
replication worker will delay the transaction commit for
min_apply_delay milliseconds.

Is it better to write "delay the transaction apply" instead of "delay
the transaction commit" just to be consistent as we do not actually
delay the commit for regular transactions.

thanks
Shveta




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-24 Thread Andres Freund
Hi,

On 2023-01-23 19:22:18 -0800, Peter Geoghegan wrote:
> On Mon, Jan 23, 2023 at 6:56 PM Andres Freund  wrote:
> > Why is there TABLE_ in AUTOVACUUM_TABLE_XID_AGE but not
> > AUTOVACUUM_DEAD_TUPLES? Both are on tables.
> 
> Why does vacuum_freeze_table_age contain the word "table", while
> autovacuum_vacuum_scale_factor does not?

I don't know. But that's not really a reason to introduce more oddities.


> To me, "table XID age" is a less awkward term for "relfrozenxid
> advancing", useful in contexts where it's probably more important to
> be understood by non-experts than it is to be unambiguous. Besides,
> relfrozenxid works at the level of the pg_class metadata. Nothing
> whatsoever needs to have changed about the table itself, nor will
> anything necessarily be changed by VACUUM (except the relfrozenxid
> field from pg_class).

I'd just go for "xid age", I don't see a point in adding 'table', particularly
when you don't for dead tuples.


> > What do you think about naming this VacuumTriggerType and adding an
> > VAC_TRIG_MANUAL or such?
> 
> But we're not doing anything with it in the context of manual VACUUMs.

It's a member of a struct passed to the routines handling both manual and
interactive vacuum.  And we could e.g. eventually start replace
IsAutoVacuumWorkerProcess() checks with it - which aren't e.g. going to work
well if we add parallel index vacuuming support to autovacuum.



> I'd prefer to keep this about what autovacuum.c thought needed to
> happen, at least for as long as manual VACUUMs are something that
> autovacuum.c knows nothing about.

It's an enum defined in a general header, not something in autovacuum.c - so I
don't really buy this.


> > > - boolforce_vacuum;
> > > + TransactionId relfrozenxid = classForm->relfrozenxid;
> > > + MultiXactId relminmxid = classForm->relminmxid;
> > > + AutoVacType trigger = AUTOVACUUM_NONE;
> > > + booltableagevac;
> >
> > Here + below we end up with three booleans that just represent the choices 
> > in
> > our fancy new enum. That seems awkward to me.
> 
> I don't follow. It's true that "wraparound" is still a synonym of
> "tableagevac" in 0001, but that changes in 0002. And even if you
> assume that 0002 won't get in, I think that it still makes sense to
> structure it in a way that shows that in principle the "wraparound"
> behaviors don't necessarily have to be used whenever "tableagevac" is
> in use.

You have booleans tableagevac, deadtupvac, inserttupvac. Particularly the
latter ones really are just a rephrasing of the trigger:

+   tableagevac = true;
+   *wraparound = false;
+   /* See header comments about trigger precedence */
+   if (TransactionIdIsNormal(relfrozenxid) &&
+   TransactionIdPrecedes(relfrozenxid, xidForceLimit))
+   trigger = AUTOVACUUM_TABLE_XID_AGE;
+   else if (MultiXactIdIsValid(relminmxid) &&
+MultiXactIdPrecedes(relminmxid, multiForceLimit))
+   trigger = AUTOVACUUM_TABLE_MXID_AGE;
+   else
+   tableagevac = false;
+
+   /* User disabled non-table-age autovacuums in pg_class.reloptions? */
+   if (!av_enabled && !tableagevac)

...

+   deadtupvac = (vactuples > vacthresh);
+   inserttupvac = (vac_ins_base_thresh >= 0 && instuples > 
vacinsthresh);
+   /* See header comments about trigger precedence */
+   if (!tableagevac)
+   {
+   if (deadtupvac)
+   trigger = AUTOVACUUM_DEAD_TUPLES;
+   else if (inserttupvac)
+   trigger = AUTOVACUUM_INSERTED_TUPLES;
+   }
+
/* Determine if this table needs vacuum or analyze. */
-   *dovacuum = force_vacuum || (vactuples > vacthresh) ||
-   (vac_ins_base_thresh >= 0 && instuples > vacinsthresh);
+   *dovacuum = (tableagevac || deadtupvac || inserttupvac);


I find this to be awkward code. The booleans are kinda pointless, and the
tableagevac case is hard to follow because trigger is set elsewhere.

I can give reformulating it a go. Need to make some food first.


I suspect that the code would look better if we didn't continue to have
"bool *dovacuum" and the trigger. They're redundant.


> Anything else for 0001? Would be nice to get it committed tomorrow.

Sorry, today was busy with meetings and bashing my head against AIX.

Greetings,

Andres Freund




Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-01-24 Thread mahendrakar s
Hi All,

The "issuer" field has been removed to align  with the RFC
implementation - https://www.rfc-editor.org/rfc/rfc7628.
This patch "v6" is a single patch to support the OAUTH BEARER token
through psql connection string.
Below flow is supported. Added the documentation in the commit messages.

 +--+ +--+
  | +---+  | Postgres |
  | PQconnect ->|   |  |  |
  | |   |  |   +---+
  | |   | -- Empty Token-> | > |   |
  | | libpq | <-- Error(Discovery + Scope ) -- | < | Pre-Auth  |
  |  +--+   |  |   |  Hook |
  | +- < | Hook |   |  |   +---+
  | |+--+   |  |  |
  | v   |   |  |  |
  |  [get token]|   |  |  |
  | |   |   |  |  |
  | +   |   |  |   +---+
  | PQconnect > |   | - Access Token > | > | Validator |
  | |   | <-- Auth Result  | < |   Hook|
  | |   |  |   +---+
  | +---+  |  |
  +--+ +--+

Please note that we are working on modifying/adding new tests (from
Jacob's Patch) with the latest changes. Will add a patch with tests
soon.

Thanks,
Mahendrakar.

On Wed, 18 Jan 2023 at 07:24, Andrey Chudnovsky  wrote:
>
> > All of this points at a bigger question to the community: if we choose
> > not to provide a flow implementation in libpq, is adding OAUTHBEARER
> > worth the additional maintenance cost?
>
> > My personal vote would be "no". I think the hook-only approach proposed
> > here would ensure that only larger providers would implement it in
> > practice
>
> Flow implementations in libpq are definitely a long term plan, and I
> agree that it would democratise the adoption.
> In the previous posts in this conversation I outlined the ones I think
> we should support.
>
> However, I don't see why it's strictly necessary to couple those.
> As long as the SASL exchange for OAUTHBEARER mechanism is supported by
> the protocol, the Client side can evolve at its own pace.
>
> At the same time, the current implementation allows clients to start
> building provider-agnostic OAUTH support. By using iddawc or OAUTH
> client implementations in the respective platforms.
> So I wouldn't refer to "larger providers", but rather "more motivated
> clients" here. Which definitely overlaps, but keeps the system open.
>
> > I'm not understanding the concern in the final point -- providers
> > generally require you to opt into device authorization, at least as far
> > as I can tell. So if you decide that it's not appropriate for your use
> > case... don't enable it. (And I haven't seen any claims that opting into
> > device authorization weakens the other flows in any way. So if we're
> > going to implement a flow in libpq, I still think device authorization
> > is the best choice, since it works on headless machines as well as those
> > with browsers.)
> I agree with the statement that Device code is the best first choice
> if we absolutely have to pick one.
> Though I don't think we have to.
>
> While device flow can be used for all kinds of user-facing
> applications, it's specifically designed for input-constrained
> scenarios. As clearly stated in the Abstract here -
> https://www.rfc-editor.org/rfc/rfc8628
> The authorization code with pkce flow is recommended by the RFSc and
> major providers for cases when it's feasible.
> The long term goal is to provide both, though I don't see why the
> backbone protocol implementation first wouldn't add value.
>
> Another point is user authentication is one side of the whole story
> and the other critical one is system-to-system authentication. Where
> we have Client Credentials and Certificates.
> With the latter it is much harder to get generically implemented, as
> provider-specific tokens need to be signed.
>
> Adding the other reasoning, I think libpq support for specific flows
> can get in the further iterations, after the protocol support.
>
> > in that case I'd rather spend cycles on generic SASL.
> I see 2 approaches to generic SASL:
> (a). Generic SASL is a framework used in the protocol, with the
> mechanisms implemented on top and exposed to the DBAs as auth types to
> configure in hba.
> This is the direction we're going here, which is well aligned with the
> existing hba-based auth configuration.
> (b). Generic SASL exposed to developers on the 

Re: plpython vs _POSIX_C_SOURCE

2023-01-24 Thread Tom Lane
Andres Freund  writes:
> Patches attached.

+1 for 0001.  I'm still nervous about 0002.  However, maybe the
cases that we had trouble with are legacy issues that nobody cares
about anymore in 2023.  We can always look for another answer if
we get complaints, I guess.

regards, tom lane




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-24 Thread Amit Kapila
On Wed, Jan 25, 2023 at 3:15 AM Peter Smith  wrote:
>
> 1.
> @@ -210,7 +210,7 @@ int logical_decoding_work_mem;
>  static const Size max_changes_in_memory = 4096; /* XXX for restore only */
>
>  /* GUC variable */
> -int logical_decoding_mode = LOGICAL_DECODING_MODE_BUFFERED;
> +int logical_replication_mode = LOGICAL_REP_MODE_BUFFERED;
>
>
> I noticed that the comment /* GUC variable */ is currently only above
> the logical_replication_mode, but actually logical_decoding_work_mem
> is a GUC variable too. Maybe this should be rearranged somehow then
> change the comment "GUC variable" -> "GUC variables"?
>

I think moving these variables together doesn't sound like a good idea
because logical_decoding_work_mem variable is defined with other
related variable. Also, if we are doing the last comment, I think that
will obviate the need for this.

> ==
> src/backend/utils/misc/guc_tables.c
>
> @@ -4908,13 +4908,13 @@ struct config_enum ConfigureNamesEnum[] =
>   },
>
>   {
> - {"logical_decoding_mode", PGC_USERSET, DEVELOPER_OPTIONS,
> + {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS,
>   gettext_noop("Allows streaming or serializing each change in logical
> decoding."),
>   NULL,
>   GUC_NOT_IN_SAMPLE
>   },
> - _decoding_mode,
> - LOGICAL_DECODING_MODE_BUFFERED, logical_decoding_mode_options,
> + _replication_mode,
> + LOGICAL_REP_MODE_BUFFERED, logical_replication_mode_options,
>   NULL, NULL, NULL
>   },
>
> That gettext_noop string seems incorrect. I think Kuroda-san
> previously reported the same, but then you replied it has been fixed
> already [1]
>
> > I felt the description seems not to be suitable for current behavior.
> > A short description should be like "Sets a behavior of logical 
> > replication", and
> > further descriptions can be added in lond description.
> I adjusted the description here.
>
> But this doesn't look fixed to me. (??)
>

Okay, so, how about the following for the 0001 patch:
short desc: Controls when to replicate each change.
long desc: On the publisher, it allows streaming or serializing each
change in logical decoding.

Then we can extend it as follows for the 0002 patch:
Controls when to replicate or apply each change
On the publisher, it allows streaming or serializing each change in
logical decoding. On the subscriber, it allows serialization of all
changes to files and notifies the parallel apply workers to read and
apply them at the end of the transaction.

> ==
> src/include/replication/reorderbuffer.h
>
> 3.
> @@ -18,14 +18,14 @@
>  #include "utils/timestamp.h"
>
>  extern PGDLLIMPORT int logical_decoding_work_mem;
> -extern PGDLLIMPORT int logical_decoding_mode;
> +extern PGDLLIMPORT int logical_replication_mode;
>
> Probably here should also be a comment to say "/* GUC variables */"
>

Okay, we can do this.

-- 
With Regards,
Amit Kapila.




Re: plpython vs _POSIX_C_SOURCE

2023-01-24 Thread Andres Freund
Hi,

On 2023-01-24 18:32:46 -0800, Andres Freund wrote:
> > I ran out of energy to test on aix with xlc, I spent way more time on this
> > than I have already. I'll pick it up later.
> 
> Building in the background now.

Also passes.


Thus I think getting rid of the #undefines is the best plan going
forward. Fewer complicated rules to follow => fewer rule violations.


Patches attached.


Greetings,

Andres Freund
>From eb3fb959b3db609db44437f783fc58cdb7f28e9f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 24 Jan 2023 11:42:35 -0800
Subject: [PATCH v1 1/3] plpython: Avoid the need to redefine *printf macros

Until now we undefined and then redefined a lot of *printf macros due to
worries about conflicts with Python.h macro definitions. Current Python.h
doesn't define any *printf macros, and older versions just defined snprintf,
vsnprintf, guarded by #if defined(MS_WIN32) && !defined(HAVE_SNPRINTF).

Thus we can replace the undefine/define section with a single
 #define HAVE_SNPRINTF 1

Discussion: https://postgr.es/m/20230124165814.2njc7gnvubn2a...@awork3.anarazel.de
---
 src/pl/plpython/plpython.h | 48 +++---
 1 file changed, 3 insertions(+), 45 deletions(-)

diff --git a/src/pl/plpython/plpython.h b/src/pl/plpython/plpython.h
index 2af0d04d1f8..2d18fc2dc1b 100644
--- a/src/pl/plpython/plpython.h
+++ b/src/pl/plpython/plpython.h
@@ -30,17 +30,10 @@
 #undef _XOPEN_SOURCE
 
 /*
- * Sometimes python carefully scribbles on our *printf macros.
- * So we undefine them here and redefine them after it's done its dirty deed.
+ * Python versions <= 3.8 otherwise define a replacement, causing macro
+ * redefinition warnings.
  */
-#undef vsnprintf
-#undef snprintf
-#undef vsprintf
-#undef sprintf
-#undef vfprintf
-#undef fprintf
-#undef vprintf
-#undef printf
+#define HAVE_SNPRINTF 1
 
 #if defined(_MSC_VER) && defined(_DEBUG)
 /* Python uses #pragma to bring in a non-default libpython on VC++ if
@@ -63,41 +56,6 @@
 #undef TEXTDOMAIN
 #define TEXTDOMAIN PG_TEXTDOMAIN("plpython")
 
-/* put back our *printf macros ... this must match src/include/port.h */
-#ifdef vsnprintf
-#undef vsnprintf
-#endif
-#ifdef snprintf
-#undef snprintf
-#endif
-#ifdef vsprintf
-#undef vsprintf
-#endif
-#ifdef sprintf
-#undef sprintf
-#endif
-#ifdef vfprintf
-#undef vfprintf
-#endif
-#ifdef fprintf
-#undef fprintf
-#endif
-#ifdef vprintf
-#undef vprintf
-#endif
-#ifdef printf
-#undef printf
-#endif
-
-#define vsnprintf		pg_vsnprintf
-#define snprintf		pg_snprintf
-#define vsprintf		pg_vsprintf
-#define sprintf			pg_sprintf
-#define vfprintf		pg_vfprintf
-#define fprintf			pg_fprintf
-#define vprintf			pg_vprintf
-#define printf(...)		pg_printf(__VA_ARGS__)
-
 /*
  * Used throughout, so it's easier to just include it everywhere.
  */
-- 
2.38.0

>From 54f95177b0f32522862cc3589c587cbe3595304b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 24 Jan 2023 19:03:36 -0800
Subject: [PATCH v1 2/3] plpython: Stop undefining _POSIX_C_SOURCE,
 _XOPEN_SOURCE

We undefined them to avoid warnings about macro redefinitions. But we haven't
fully followed the necessary include order, since at least 147c2482542, in
2011.  Recently the combination of the include order rules not being followed
and undefining _POSIX_C_SOURCE started to cause a compile failure, starting
with 03023a2664f. Undefining _POSIX_C_SOURCE hides clock_gettime(), which is
referenced in an inline function as of 03023a2664f, whereas it was a macro
before.

After seeing some evidence that undefining _POSIX_C_SOURCE et al isn't
required, I tried to build postgres with plpython on most of our supported
platforms (except DragonFlyBSD and Illumos, but similar systems were tested),
with/without the #undefines. No compiler warning / behavioral difference.

The oldest supported python version, 3.2, defines _POSIX_C_SOURCE to 200112L
ad _XOPEN_SOURCE to 600, whereas newer versions of python use 200809L/700
respectively. As _POSIX_C_SOURCE/_XOPEN_SOURCE will default to the newer
operating system on most platforms, it's possible that when using python 3.2
new warnings would be emitted - but that seems acceptable.

Discussion: https://postgr.es/m/20230124165814.2njc7gnvubn2a...@awork3.anarazel.de
---
 src/pl/plpython/plpython.h | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/src/pl/plpython/plpython.h b/src/pl/plpython/plpython.h
index 2d18fc2dc1b..91f6f8d8607 100644
--- a/src/pl/plpython/plpython.h
+++ b/src/pl/plpython/plpython.h
@@ -12,23 +12,13 @@
 #ifndef PLPYTHON_H
 #define PLPYTHON_H
 
-/*
- * Include order should be: postgres.h, other postgres headers, plpython.h,
- * other plpython headers.  (In practice, other plpython headers will also
- * include this file, so that they can compile standalone.)
- */
-#ifndef POSTGRES_H
+/* postgres.h needs to be included before Python.h, as usual */
+#if !defined(POSTGRES_H)
 #error postgres.h must be included before plpython.h
+#elif defined(Py_PYTHON_H)
+#error 

Re: Fix to enum hashing for dump and restore

2023-01-24 Thread Tom Lane
Andrew  writes:
> I have discovered a bug in one usage of enums. If a table with hash
> partitions uses an enum as a partitioning key, it can no longer be
> backed up and restored correctly. This is because enums are represented
> simply as oids, and the hash function for enums hashes that oid to
> determine partition distribution. Given the way oids are assigned, any
> dump+restore of a database with such a table may fail with the error
> "ERROR: new row for relation "TABLENAME" violates partition constraint".

Ugh, that was not well thought out :-(.  I suppose this isn't a problem
for pg_upgrade, which should preserve the enum value OIDs, but an
ordinary dump/restore will indeed hit this.

> I have written a patch to fix this bug (attached), by instead having the
> hashenum functions look up the enumsortorder ID of the value being
> hashed. These are deterministic across databases, and so allow for
> stable dump and restore.

Unfortunately, I'm not sure those are as deterministic as all that.
They are floats, so there's a question of roundoff error, not to
mention cross-platform variations in what a float looks like.  (At the
absolute minimum, I think we'd have to change your patch to force
consistent byte ordering of the floats.)  Actually though, roundoff
error wouldn't be a problem for the normal exact-integer values of
enumsortorder.  Where it could come into play is with the fractional
values used after you insert a value into the existing sort order.
And then the whole idea fails, because a dump/restore won't duplicate
those fractional values.

Another problem with this approach is that we can't get from here to there
without a guaranteed dump/reload failure, since it's quite unlikely that
the partition assignment will be the same when based on enumsortorder
as it was when based on OIDs.  Worse, it also breaks the pg_upgrade case.

I wonder if it'd work to make pg_dump force --load-via-partition-root
mode when a hashed partition key includes an enum.

regards, tom lane




Fix to enum hashing for dump and restore

2023-01-24 Thread Andrew
Hello,

I have discovered a bug in one usage of enums. If a table with hash
partitions uses an enum as a partitioning key, it can no longer be
backed up and restored correctly. This is because enums are represented
simply as oids, and the hash function for enums hashes that oid to
determine partition distribution. Given the way oids are assigned, any
dump+restore of a database with such a table may fail with the error
"ERROR: new row for relation "TABLENAME" violates partition constraint".

This can be reproduced with the following steps:

create database test;
\c test
create type colors as enum ('red', 'green', 'blue', 'yellow');
create table part (color colors) partition by hash(color);

create table prt_0 partition of part for values 
with (modulus 3, remainder 0);

create table prt_1 partition of part for values 
with (modulus 3, remainder 1);

create table prt_2 partition of part for values 
with (modulus 3, remainder 2);
insert into part values ('red');

/usr/local/pgsql/bin/pg_dump -d test -f /tmp/dump.sql
/usr/local/pgsql/bin/createdb test2
/usr/local/pgsql/bin/psql test2 -f /tmp/dump.sql


I have written a patch to fix this bug (attached), by instead having the
hashenum functions look up the enumsortorder ID of the value being
hashed. These are deterministic across databases, and so allow for
stable dump and restore. This admittedly comes at the performance cost
of doing a catalog lookup, but there is precedent for this in
e.g. hashrange and hashtext.

I look forward to your feedback on this, thank you!

Sincerely,
Andrew J Repp (VMware)

v1-0001-Change-enum-hashing-to-consider-enumsortorder.patch
Description: Binary data


Re: document the need to analyze partitioned tables

2023-01-24 Thread David Rowley
On Wed, 18 Jan 2023 at 22:15, Laurenz Albe  wrote:
> Attached is a new version of my patch that tries to improve the wording.

I had a look at this and agree that we should adjust the paragraph in
question if people are finding it confusing.

For your wording, I found I had a small problem with calling
partitions of a partitioned tables "normal tables" in:

+The partitions of a partitioned table are normal tables and get processed
+by autovacuum, but autovacuum doesn't process the partitioned table itself.

I started to adjust that but since the text is fairly short it turned
out quite different from what you had.

I ended up with:

+With partitioned tables, since these do not directly store tuples, these
+do not require autovacuum to perform any VACUUM
+operations.  Autovacuum simply performs a VACUUM on the
+partitioned table's partitions the same as it does with normal tables.
+However, the same is true for ANALYZE operations, and
+this can be problematic as there are various places in the query planner
+that attempt to make use of table statistics for partitioned tables when
+partitioned tables are queried.  For now, you can work around this problem
+by running a manual ANALYZE command on the partitioned
+table when the partitioned table is first populated, and again whenever
+the distribution of data in its partitions changes significantly.

which I've also attached in patch form.

I know there's been a bit of debate on the wording and a few patches,
so I may not be helping.  If nobody is against the above, then I don't
mind going ahead with it and backpatching to whichever version this
first applies to. I just felt I wasn't 100% happy with what was being
proposed.

David
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 759ea5ac9c..d327bdc564 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -860,10 +860,17 @@ analyze threshold = analyze base threshold + analyze 
scale factor * number of tu

 

-Partitioned tables are not processed by autovacuum.  Statistics
-should be collected by running a manual ANALYZE when it 
is
-first populated, and again whenever the distribution of data in its
-partitions changes significantly.
+With partitioned tables, since these do not directly store tuples, these
+do not require autovacuum to perform any VACUUM
+operations.  Autovacuum simply performs a VACUUM on the
+partitioned table's partitions the same as it does with normal tables.
+However, the same is true for ANALYZE operations, and
+this can be problematic as there are various places in the query planner
+that attempt to make use of table statistics for partitioned tables when
+partitioned tables are queried.  For now, you can work around this problem
+by running a manual ANALYZE command on the partitioned
+table when the partitioned table is first populated, and again whenever
+the distribution of data in its partitions changes significantly.

 



Re: GUCs to control abbreviated sort keys

2023-01-24 Thread Robert Haas
On Sat, Jan 21, 2023 at 8:16 PM Jeff Davis  wrote:
> This is fairly simple, so I plan to commit soon.

I find it a bit premature to include this comment in the very first
email what if other people don't like the idea?

I would like to hear about the cases where abbreviated keys resulted
in a regression.

I'd also like to know whether there's a realistic possibility that
making this a run-time test could itself result in a regression.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: plpython vs _POSIX_C_SOURCE

2023-01-24 Thread Andres Freund
Hi,

On 2023-01-24 17:48:56 -0800, Andres Freund wrote:
> Also for autoconf, I needed to link
> $prefix/lib/python3.11/config-3.11/libpython3.11.a
> to
> $prefix/lib/libpython3.11.a
> That might be a python version difference or be related to building python
> with --enable-shared - but I see saw other problems without --enable-shared.

That actually doesn't quite work right. One needs to either link to the file
by name (i.e. just $prefix/lib/libpython3.11.so instead of -lpython3.11), or
create a wrapper .a "manually". I.e.

ar crs $prefix/lib/libpython3.11.a $prefix/lib/libpython3.11.so

I tried quite a few things and got confused between the attempts.


> I ran out of energy to test on aix with xlc, I spent way more time on this
> than I have already. I'll pick it up later.

Building in the background now.

Greetings,

Andres Freund




Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-24 Thread Thomas Munro
On Fri, Jan 20, 2023 at 9:16 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > So I think we probably need something like the attached, which I was
> > originally trying to avoid.
>
> Yeah, something like that.  I also wonder if you don't need to think
> a bit harder about the ordering of the flag checks, in particular
> it seems like servicing reload_request before child_exit might be
> a good idea (remembering that child_exit might cause launching of
> new children, so we want to be up to speed on relevant settings).

Agreed, and done.




Re: plpython vs _POSIX_C_SOURCE

2023-01-24 Thread Andres Freund
Hi,

On 2023-01-24 16:16:06 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Python's _POSIX_C_SOURCE value is set to a specific value in their configure
> > script:
>
> > if test $define_xopen_source = yes
> > then
> >   ...
> >   AC_DEFINE(_POSIX_C_SOURCE, 200809L, Define to activate features from IEEE 
> > Stds 1003.1-2008)
> > fi
>
> Hm.  I looked into Python 3.2 (the oldest release we still support)
> and it has similar code but
>
>   AC_DEFINE(_POSIX_C_SOURCE, 200112L, Define to activate features from IEEE 
> Stds 1003.1-2001)
>
> So yeah it's fixed (or else not defined) for any particular Python
> release, but could vary across releases.

Looks like it changed in 3.3:

$ git grep -E 'AC_DEFINE.*_POSIX_C_SOURCE' v3.2 v3.3.0
v3.2:configure.in:  AC_DEFINE(_POSIX_C_SOURCE, 200112L, Define to activate 
features from IEEE Stds 1003.1-2001)
v3.3.0:configure.ac:  AC_DEFINE(_POSIX_C_SOURCE, 200809L, Define to activate 
features from IEEE Stds 1003.1-2008)

I'm not sure we need to care a lot about a build with python 3.3 triggering a
bunch of warnings.

Personally I'd just bump the python requirements to well above it - the last
3.2 release was Oct. 12, 2014.

Official EOL date:
Ver Last ReleaseEOL Date
3.2 2014-10-11  2016-02-20
3.3 2017-09-19  2017-09-29
>From 3.4 on there's just an official last release:
3.4 2019-03-18
3.5 2020-09-05
3.6 2021-09-04
3.7 2023-06-??



> > Solaris and AIX are the ones missing.
> > I guess I'll test them manually. It seems promising not to need this stuff
> > anymore?
>
> Given that hoverfly is AIX, I'm betting there's an issue there.

Doesn't look that way.

I found plenty problems on AIX, but all independent of _POSIX_C_SOURCE.


Both autoconf and meson builds seem to need externally specified
-D_LARGE_FILES=1 to build successfully when using plpython, otherwise we end
up with conflicting signatures with lseek. I see that Noah has that in his
buildfarm config.  ISTM that we should just move that into our build specs.




To get 64 bit autoconf to link plpython3.so correctly, I needed to add to
manually add -lpthread:
ld: 0711-317 ERROR: Undefined symbol: .pthread_init
...

I suspect Noah might not hit this, because one of the dependencies he has
enabled already adds it to the backend LDFLAGS.


Also for autoconf, I needed to link
$prefix/lib/python3.11/config-3.11/libpython3.11.a
to
$prefix/lib/libpython3.11.a
That might be a python version difference or be related to building python
with --enable-shared - but I see saw other problems without --enable-shared.



I ran out of energy to test on aix with xlc, I spent way more time on this
than I have already. I'll pick it up later.



I also tested 64bit solaris. No relevant warnings (lots of other warnings
though), tests pass, with both acc and gcc.



> >> Anyway, I'm still of the opinion that what a11cf433413 tried to do
> >> was the best available fix, and we need to do whatever we have to do
> >> to plpython's headers to reinstate that coding rule.
>
> > You think it's not a viable path to just remove the _POSIX_C_SOURCE,
> > _XOPEN_SOURCE undefines?
>
> I think at the least that will result in warnings on some platforms,
> and at the worst in actual build problems.  Maybe there are no more
> of the latter a dozen years after the fact, but ...

I think it might be ok. I tested nearly all OSs that we support, with the
exception of DragonFlyBSD and Illumos, which both are very similar to tested
operating systems.


> Nice idea.  We did not have that option while we were using HAVE_SNPRINTF
> ourselves, but now that we are not I concur that this should work.

Cool.


> (I confirmed that their code looks the same in Python 3.2.)
> Note that you'd better make it
>
> #define HAVE_SNPRINTF 1
>
> or you risk macro-redefinition warnings.

Good point.

I guess I'll push that part first, given that we have agreement how it should
look like.

Greetings,

Andres Freund




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Kyotaro Horiguchi
Sorry for making you bothered by this.

At Tue, 24 Jan 2023 10:12:40 +, "Hayato Kuroda (Fujitsu)" 
 wrote in 
> > > Couldn't we maintain an additional static variable "last_applied"
> > > along with last_received?
> > >
> > 
> > It won't be easy to maintain the meaning of last_applied because there
> > are cases where we don't apply the change directly. For example, in
> > case of streaming xacts, we will just keep writing it to the file,
> > now, say, due to some reason, we have to send the feedback, then it
> > will not allow you to update the latest write locations. This would
> > then become different then what we are doing without the patch.
> > Another point to think about is that we also need to keep the variable
> > updated for keep-alive ('k') messages even though we don't apply
> > anything in that case. Still, other cases to consider are where we
> > have mix of streaming and non-streaming transactions.
> 
> I have tried to implement that, but it might be difficult because of a corner
> case related with the initial data sync.
> 
> First of all, I have made last_applied to update when
> 
> * transactions are committed, prepared, or aborted
> * apply worker receives keepalive message.

Yeah, I vagurly thought that it is enough that the update happens just
befor existing send_feecback() calls. But it turned out to introduce
another unprincipledness..

> I thought during the initial data sync, we must not update the last applied
> triggered by keepalive messages, so following lines were added just after
> updating last_received.
> 
> ```
> +   if (last_applied < end_lsn && 
> AllTablesyncsReady())
> +   last_applied = 
> end_lsn;
> ```

Maybe, the name "last_applied" made you confused. As I mentioned in
another message, the variable points to the remote LSN of last
"processed" 'w/k' messages.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: GUCs to control abbreviated sort keys

2023-01-24 Thread Justin Pryzby
On Sat, Jan 21, 2023 at 05:16:01PM -0800, Jeff Davis wrote:

> +  xreflabel="sort_abbreviated_keys">
> +  sort_abbreviated_keys (boolean)
> +  
> +   sort_abbreviated_keys configuration 
> parameter
> +  
> +  
> +  
> +   
> +Enables or disables the use of abbreviated sort keys, an 
> optimization,
> +if applicable. The default is true. Disabling may

I think "an optimization, if applicable" is either too terse, or somehow
wrong.  Maybe:

| Enables or disables the use of abbreviated keys, a sort optimization...

> +optimization could return wrong results. Set to
> +true if certain that 
> strxfrm()
> +can be trusted.

"if you are certain"; or "if it is ..."

-- 
Justin




bitscan forward/reverse on Windows

2023-01-24 Thread John Naylor
Attached is a quick-and-dirty attempt to add MSVC support for the
rightmost/leftmost-one-pos functions.

0001 adds asserts to the existing coding.
0002 adds MSVC support. Tests pass on CI, but it's of course possible that
there is some bug that prevents hitting the fastpath. I've mostly used
the platform specific types, so some further cleanup might be needed.
0003 tries one way to reduce the duplication that arose in 0002. Maybe
there is a better way.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From 9390a74f1712d58c88ac513afff8d878071c040c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Tue, 24 Jan 2023 15:57:53 +0700
Subject: [PATCH v3 2/3] Add MSVC support for bitscan reverse/forward

---
 src/include/port/pg_bitutils.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index 9a53f53d11..ce04f4ffad 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -13,6 +13,10 @@
 #ifndef PG_BITUTILS_H
 #define PG_BITUTILS_H
 
+#ifdef _MSC_VER
+#include 
+#endif
+
 extern PGDLLIMPORT const uint8 pg_leftmost_one_pos[256];
 extern PGDLLIMPORT const uint8 pg_rightmost_one_pos[256];
 extern PGDLLIMPORT const uint8 pg_number_of_ones[256];
@@ -27,6 +31,9 @@ pg_leftmost_one_pos32(uint32 word)
 {
 #ifdef HAVE__BUILTIN_CLZ
 	int			bitscan_result;
+#elif defined(_MSC_VER)
+	unsigned long bitscan_result;
+	unsigned char non_zero; // XXX can this be bool?
 #endif
 
 #if !defined(HAVE__BUILTIN_CLZ) || defined(USE_ASSERT_CHECKING)
@@ -45,6 +52,11 @@ pg_leftmost_one_pos32(uint32 word)
 	bitscan_result = 31 - __builtin_clz(word);
 	Assert(bitscan_result == result);
 	return bitscan_result;
+#elif defined(_MSC_VER)
+	non_zero = _BitScanReverse(_result, word);
+	Assert(non_zero);
+	Assert(bitscan_result == result);
+	return bitscan_result;
 #else
 	return result;
 #endif			/* HAVE__BUILTIN_CLZ */
@@ -59,6 +71,9 @@ pg_leftmost_one_pos64(uint64 word)
 {
 #ifdef HAVE__BUILTIN_CLZ
 	int			bitscan_result;
+#elif defined(_MSC_VER)
+	unsigned long bitscan_result;
+	unsigned char non_zero; // XXX can this be bool?
 #endif
 
 #if !defined(HAVE__BUILTIN_CLZ) || defined(USE_ASSERT_CHECKING)
@@ -83,6 +98,11 @@ pg_leftmost_one_pos64(uint64 word)
 #endif			/* HAVE_LONG_... */
 	Assert(bitscan_result == result);
 	return bitscan_result;
+#elif defined(_MSC_VER)
+	non_zero = _BitScanReverse64(_result, word);
+	Assert(non_zero);
+	Assert(bitscan_result == result);
+	return bitscan_result;
 #else
 	return result;
 #endif			/* HAVE__BUILTIN_CLZ */
@@ -99,6 +119,10 @@ pg_rightmost_one_pos32(uint32 word)
 #ifdef HAVE__BUILTIN_CTZ
 	const uint32 orig_word = word;
 	int			bitscan_result;
+#elif defined(_MSC_VER)
+	const unsigned long orig_word = word;
+	unsigned long bitscan_result;
+	unsigned char non_zero; // XXX can this be bool?
 #endif
 
 #if !defined(HAVE__BUILTIN_CTZ) || defined(USE_ASSERT_CHECKING)
@@ -118,6 +142,11 @@ pg_rightmost_one_pos32(uint32 word)
 	bitscan_result = __builtin_ctz(orig_word);
 	Assert(bitscan_result == result);
 	return bitscan_result;
+#elif defined(_MSC_VER)
+	non_zero = _BitScanForward(_result, orig_word);
+	Assert(non_zero);
+	Assert(bitscan_result == result);
+	return bitscan_result;
 #else
 	return result;
 #endif			/* HAVE__BUILTIN_CTZ */
@@ -133,6 +162,10 @@ pg_rightmost_one_pos64(uint64 word)
 #ifdef HAVE__BUILTIN_CTZ
 	const uint64 orig_word = word;
 	int			bitscan_result;
+#elif defined(_MSC_VER)
+	const unsigned __int64 orig_word = word;
+	unsigned long bitscan_result;
+	unsigned char non_zero; // XXX can this be bool?
 #endif
 
 #if !defined(HAVE__BUILTIN_CTZ) || defined(USE_ASSERT_CHECKING)
@@ -158,6 +191,11 @@ pg_rightmost_one_pos64(uint64 word)
 #endif			/* HAVE_LONG_... */
 	Assert(bitscan_result == result);
 	return bitscan_result;
+#elif defined(_MSC_VER)
+	non_zero = _BitScanForward64(_result, orig_word);
+	Assert(non_zero);
+	Assert(bitscan_result == result);
+	return bitscan_result;
 #else
 	return result;
 #endif			/* HAVE__BUILTIN_CTZ */
-- 
2.39.0

From 88b98a3fe8d9b5f9a10a839e65b34c43c03e33b2 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Tue, 24 Jan 2023 14:39:26 +0700
Subject: [PATCH v3 1/3] Add asserts to verify bitscan intrinsics

---
 src/include/port/pg_bitutils.h | 86 --
 1 file changed, 60 insertions(+), 26 deletions(-)

diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index a49a9c03d9..9a53f53d11 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -26,10 +26,11 @@ static inline int
 pg_leftmost_one_pos32(uint32 word)
 {
 #ifdef HAVE__BUILTIN_CLZ
-	Assert(word != 0);
+	int			bitscan_result;
+#endif
 
-	return 31 - __builtin_clz(word);
-#else
+#if !defined(HAVE__BUILTIN_CLZ) || defined(USE_ASSERT_CHECKING)
+	int			result;
 	int			shift = 32 - 8;
 
 	Assert(word != 0);
@@ -37,7 +38,15 @@ pg_leftmost_one_pos32(uint32 word)
 	while ((word >> 

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-24 Thread Masahiko Sawada
On Mon, Jan 23, 2023 at 8:20 PM John Naylor
 wrote:
>
> On Mon, Jan 16, 2023 at 3:18 PM Masahiko Sawada  wrote:
> >
> > On Mon, Jan 16, 2023 at 2:02 PM John Naylor
> >  wrote:
>
> In v21, all of your v20 improvements to the radix tree template and test have 
> been squashed into 0003, with one exception: v20-0010 (recursive freeing of 
> shared mem), which I've attached separately (for flexibility) as v21-0006. I 
> believe one of your earlier patches had a new DSA function for freeing memory 
> more quickly -- was there a problem with that approach? I don't recall where 
> that discussion went.

Hmm, I don't remember I proposed such a patch, either.

One idea to address it would be that we pass a shared memory to
RT_CREATE() and we create a DSA area dedicated to the radix tree in
place. We should return the created DSA area along with the radix tree
so that the caller can use it (e.g., for dsa_get_handle(), dsa_pin(),
and dsa_pin_mapping() etc). In RT_FREE(), we just detach from the DSA
area. A downside of this idea would be that one DSA area only for a
radix tree is always required.

Another idea would be that we allocate a big enough DSA area and
quarry small memory for nodes from there. But it would need to
introduce another complexity so I prefer to avoid it.

FYI the current design is inspired by dshash.c. In dshash_destory(),
we dsa_free() each elements allocated by dshash.c

>
> > + * XXX: Most functions in this file have two variants for inner nodes and 
> > leaf
> > + * nodes, therefore there are duplication codes. While this sometimes 
> > makes the
> > + * code maintenance tricky, this reduces branch prediction misses when 
> > judging
> > + * whether the node is a inner node of a leaf node.
> >
> > This comment seems to be out-of-date since we made it a template.
>
> Done in 0020, along with a bunch of other comment editing.
>
> > The following macros are defined but not undefined in radixtree.h:
>
> Fixed in v21-0018.
>
> Also:
>
> 0007 makes the value type configurable. Some debug functionality still 
> assumes integer type, but I think the rest is agnostic.

radixtree_search_impl.h still assumes that the value type is an
integer type as follows:

#ifdef RT_NODE_LEVEL_LEAF
RT_VALUE_TYPE   value = 0;

Assert(RT_NODE_IS_LEAF(node));
#else

Also, I think if we make the value type configurable, it's better to
pass the pointer of the value to RT_SET() instead of copying the
values since the value size could be large.

> 0010 turns node4 into node3, as discussed, going from 48 bytes to 32.
> 0012 adopts the benchmark module to the template, and adds meson support 
> (builds with warnings, but okay because not meant for commit).
>
> The rest are cleanups, small refactorings, and more comment rewrites. I've 
> kept them separate for visibility. Next patch can squash them unless there is 
> any discussion.

0008 patch

for (int i = 0; i < RT_SIZE_CLASS_COUNT; i++)
-   fprintf(stderr, "%s\tinner_size %zu\tinner_blocksize
%zu\tleaf_size %zu\tleaf_blocksize %zu\n",
+   fprintf(stderr, "%s\tinner_size %zu\tleaf_size %zu\t%zu\n",
RT_SIZE_CLASS_INFO[i].name,
RT_SIZE_CLASS_INFO[i].inner_size,
-   RT_SIZE_CLASS_INFO[i].inner_blocksize,
-   RT_SIZE_CLASS_INFO[i].leaf_size,
-   RT_SIZE_CLASS_INFO[i].leaf_blocksize);
+   RT_SIZE_CLASS_INFO[i].leaf_size);

There is additional '%zu' at the end of the format string:

---
0011 patch

+ * 1. With 5 or more kinds, gcc tends to use a jump table for switch
+ *statments.

typo: s/statments/statements/

The rest look good to me. I'll incorporate these fixes in the next
version patch.

>
> > > uint32 is how we store the block number, so this too small and will wrap 
> > > around on overflow. int64 seems better.
> >
> > Agreed, will fix.
>
> Great, but it's now uint64, not int64. All the large counters in struct 
> LVRelState, for example, are signed integers, as the usual practice. Unsigned 
> ints are "usually" for things like bit patterns and where explicit wraparound 
> is desired. There's probably more that can be done here to change to signed 
> types, but I think it's still a bit early to get to that level of nitpicking. 
> (Soon, I hope :-) )

Agreed. I'll change it in the next version patch.

>
> > > + * We calculate the maximum bytes for the TidStore in different ways
> > > + * for non-shared case and shared case. Please refer to the comment
> > > + * TIDSTORE_MEMORY_DEDUCT for details.
> > > + */
> > >
> > > Maybe the #define and comment should be close to here.
> >
> > Will fix.
>
> For this, I intended that "here" meant "in or just above the function".
>
> +#define TIDSTORE_LOCAL_MAX_MEMORY_DEDUCT (1024L * 70) /* 70kB */
> +#define TIDSTORE_SHARED_MAX_MEMORY_RATIO_PO2 (float) 0.75
> +#define TIDSTORE_SHARED_MAX_MEMORY_RATIO (float) 0.6
>
> 

Re: Add LZ4 compression in pg_dump

2023-01-24 Thread Justin Pryzby
On Tue, Jan 24, 2023 at 03:56:20PM +, gkokola...@pm.me wrote:
> On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby 
>  wrote:
> > On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote:
> > 
> > > Please find attached v23 which reintroduces the split.
> > > 
> > > 0001 is reworked to have a reduced footprint than before. Also in an 
> > > attempt
> > > to facilitate the readability, 0002 splits the API's and the uncompressed
> > > implementation in separate files.
> > 
> > Thanks for updating the patch. Could you address the review comments I
> > sent here ?
> > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
> 
> Please find v24 attached.

Thanks for updating the patch.

In 001, RestoreArchive() does:

> -#ifndef HAVE_LIBZ
> -   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
> -   AH->PrintTocDataPtr != NULL)
> +   supports_compression = false;
> +   if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
> +   AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> +   supports_compression = true;
> +
> +   if (AH->PrintTocDataPtr != NULL)
> {
> for (te = AH->toc->next; te != AH->toc; te = te->next)
> {
> if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> -   pg_fatal("cannot restore from compressed 
> archive (compression not supported in this installation)");
> +   {
> +#ifndef HAVE_LIBZ
> +   if (AH->compression_spec.algorithm == 
> PG_COMPRESSION_GZIP)
> +   supports_compression = false;
> +#endif
> +   if (supports_compression == false)
> +   pg_fatal("cannot restore from 
> compressed archive (compression not supported in this installation)");
> +   }
> }
> }
> -#endif

This first checks if the algorithm is implemented, and then checks if
the algorithm is supported by the current build - that confused me for a
bit.  It seems unnecessary to check for unimplemented algorithms before
looping.  That also requires referencing both GZIP and LZ4 in two
places.

I think it could be written to avoid the need to change for added
compression algorithms:

+   if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
+   {
+   /* Check if the compression algorithm is 
supported */
+   pg_compress_specification spec;
+   
parse_compress_specification(AH->compression_spec.algorithm, NULL, );
+   if (spec->parse_error != NULL)
+   pg_fatal(spec->parse_error);
+   }

Or maybe add a new function to compression.c to indicate whether a given
algorithm is supported.

That would also indicate *which* compression library isn't supported.

Other than that, I think 001 is ready.

002/003 use these names, which I think are too similar - initially I
didn't even realize there were two separate functions (each with a
second stub function to handle the case of unsupported compression):

+extern void InitCompressorGzip(CompressorState *cs, const 
pg_compress_specification compression_spec);

  
+extern void InitCompressGzip(CompressFileHandle *CFH, const 
pg_compress_specification compression_spec);



+extern void InitCompressorLZ4(CompressorState *cs, const 
pg_compress_specification compression_spec);

 
+extern void InitCompressLZ4(CompressFileHandle *CFH, const 
pg_compress_specification compression_spec);

   

typo:
s/not build with/not built with/

Should AllocateCompressor() set cs->compression_spec, rather than doing
it in each compressor ?

Thanks for considering.

-- 
Justin




Re: Improve logging when using Huge Pages

2023-01-24 Thread Justin Pryzby
On Mon, Jan 23, 2023 at 05:33:35PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2023-01-23 19:21:00 -0600, Justin Pryzby wrote:
> > Michael seemed to support this idea and nobody verbalized hatred of it,
> > so I implemented it.  In v15, we have shared_memory_size_in_huge_pages,
> > so this adds effective_huge_pages.
> > 
> > Feel free to suggest a better name.
> 
> Hm. Should this be a GUC? There's a reason shared_memory_size_in_huge_pages is
> one - it's so it's accessible via -C. But here we could make it a function or
> whatnot as well.

I have no strong opinion, but a few minor arguments in favour of a GUC:

  - the implementation parallels huge_pages, huge_page_size, and
shared_memory_size_in_huge_pages;
  - it's short;
  - it's glob()able with the others:

postgres=# \dconfig *huge*
 List of configuration parameters
Parameter | Value 
--+---
 effective_huge_pages | off
 huge_pages   | try
 huge_page_size   | 0
 shared_memory_size_in_huge_pages | 12

> I'm ok with this being a GUC, it doesn't feel elegant, but I suspect there's
> no realistic elegant answer.

BTW, I didn't realize it when I made the suggestion, nor when I wrote
the patch, but a GUC was implemented in the v2 patch.
https://www.postgresql.org/message-id/TU4PR8401MB1152CB4FEB99658BC6B35543EECF9%40TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM

The original proposal was to change the elog(DEBUG1, "mmap..") to LOG,
and the thread seems to have derailed out of concern for a hypothetical
user who was adverse to an additional line of log messages during server
start.

I don't share that concern, but GUC or function seems better anyway,
since the log message is not easily available (except maybe, sometimes,
shortly after the server restart).  I'm currently checking for huge
pages in a nagios script by grepping /proc/pid/smaps (I *could* make use
of a logfile if that was available, but it's better if it's a persistent
state/variable).

Whether it's a GUC or a function, I propose to name it hugepages_active.
If there's no objections, I'll add to the CF.

-- 
Justin




Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Michael Paquier
On Tue, Jan 24, 2023 at 07:42:06PM -0500, Tom Lane wrote:
> That systemd behavior affects IPC resources regardless of what process
> created them.

Thanks, my memory was fuzzy regarding that.  I am curious if the error
in the recovery tests will persist with that set up.  The next run
will be in a few hours, so let's see..
--
Michael


signature.asc
Description: PGP signature


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Kyotaro Horiguchi
At Tue, 24 Jan 2023 14:22:19 +0530, Amit Kapila  wrote 
in 
> On Tue, Jan 24, 2023 at 12:44 PM Peter Smith  wrote:
> >
> > On Tue, Jan 24, 2023 at 5:58 PM Amit Kapila  wrote:
> > >
> > > On Tue, Jan 24, 2023 at 8:15 AM Kyotaro Horiguchi
> > >  wrote:
> > > >
> > > > > Attached the updated patch v19.
> > > >
> > > > + maybe_delay_apply(TransactionId xid, TimestampTz finish_ts)
> > > >
> > > > I look this spelling strange.  How about maybe_apply_delay()?
> > > >
> > >
> > > +1.
> >
> > It depends on how you read it. I read it like this:
> >
> > maybe_delay_apply === means "maybe delay [the] apply"
> > (which is exactly what the function does)
> >
> > versus
> >
> > maybe_apply_delay === means "maybe [the] apply [needs a] delay"
> > (which is also correct, but it seemed a more awkward way to say it IMO)
> >
> 
> This matches more with GUC and all other usages of variables in the
> patch. So, I still prefer the second one.

I read it as "maybe apply [the] delay [to something suggested by the
context]". If we go the first way, I will name it as
"maybe_delay_apply_change" or something that has an extra word.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Kyotaro Horiguchi
In short, I'd like to propose renaming the parameter in_delayed_apply
of send_feedback to "has_unprocessed_change".

At Tue, 24 Jan 2023 12:27:58 +0530, Amit Kapila  wrote 
in 
> > send_feedback():
> > +* If the subscriber side apply is delayed (because of time-delayed
> > +* replication) then do not tell the publisher that the received 
> > latest
> > +* LSN is already applied and flushed, otherwise, it leads to the
> > +* publisher side making a wrong assumption of logical replication
> > +* progress. Instead, we just send a feedback message to avoid a 
> > publisher
> > +* timeout during the delay.
> >  */
> > -   if (!have_pending_txes)
> > +   if (!have_pending_txes && !in_delayed_apply)
> > flushpos = writepos = recvpos;
> >
> > Honestly I don't like this wart. The reason for this is the function
> > assumes recvpos = applypos but we actually call it while holding
> > unapplied changes, that is, applypos < recvpos.
> >
> > Couldn't we maintain an additional static variable "last_applied"
> > along with last_received?
> >
> 
> It won't be easy to maintain the meaning of last_applied because there
> are cases where we don't apply the change directly. For example, in
> case of streaming xacts, we will just keep writing it to the file,
> now, say, due to some reason, we have to send the feedback, then it
> will not allow you to update the latest write locations. This would
> then become different then what we are doing without the patch.
> Another point to think about is that we also need to keep the variable
> updated for keep-alive ('k') messages even though we don't apply
> anything in that case. Still, other cases to consider are where we
> have mix of streaming and non-streaming transactions.

Yeah.  Even though I named it as "last_applied", its objective is to
have get_flush_position returning the correct have_pending_txes
without a hint from callers, that is, "let g_f_position know if
store_flush_position has been called with the last received data".

Anyway I tried that but didn't find a clean and simple way. However,
while on it, I realized what the code made me confused.

+static void send_feedback(XLogRecPtr recvpos, bool force, bool requestReply,
+ bool in_delayed_apply);

The name "in_delayed_apply" doesn't donsn't give me an idea of what
the function should do for it. If it is named "has_unprocessed_change",
I think it makes sense that send_feedback should think there may be an
outstanding transaction that is not known to the function.


So, my conclusion here is I'd like to propose changing the parameter
name to "has_unapplied_change".


> >  In this case the condition cited above
> > would be as follows and in_delayed_apply will become unnecessary.
> >
> > +   if (!have_pending_txes && last_received == last_applied)
> >
> > The function is a static function and always called with a variable
> > last_received that has the same scope with the function, as the first

Sorry for the noise, I misread it. Maybe I took the "function-scoped"
variable as file-scoped.. Thus the discussion is false.

> > parameter. Thus we can remove the first parameter then let the
> > function directly look at the both two varaibles instead.
> >
> 
> I think this is true without this patch, so why that has not been
> followed in the first place? One comment, I see in this regard is as
> below:
> 
> /* It's legal to not pass a recvpos */
> if (recvpos < last_recvpos)
> recvpos = last_recvpos;

Sorry. I don't understand this. It is just a part of the ratchet
mechanism for the last received lsn to report.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: heapgettup refactoring

2023-01-24 Thread Melanie Plageman
On Tue, Jan 24, 2023 at 04:17:23PM -0500, Melanie Plageman wrote:
> Thanks for taking a look!
> 
> On Mon, Jan 23, 2023 at 6:08 AM David Rowley  wrote:
> >
> > On Thu, 19 Jan 2023 at 00:04, Peter Eisentraut
> >  wrote:
> > > In your v2 patch, you remove these assertions:
> > >
> > > -   /* check that rs_cindex is in sync */
> > > -   Assert(scan->rs_cindex < scan->rs_ntuples);
> > > -   Assert(lineoff == scan->rs_vistuples[scan->rs_cindex]);
> > >
> > > Is that intentional?
> > >
> > > I don't see any explanation, or some other equivalent code appearing
> > > elsewhere to replace this.
> >
> > I guess it's because those asserts are not relevant unless
> > heapgettup_no_movement() is being called from heapgettup_pagemode().
> > Maybe they can be put back along the lines of:
> >
> > Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 ||
> > scan->rs_cindex < scan->rs_ntuples);
> > Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 || lineoff ==
> > scan->rs_vistuples[scan->rs_cindex]);
> >
> > but it probably would be cleaner to just do an: if
> > (scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) { Assert(...);
> > Assert(...}; }
> 
> I prefer the first method and have implemented that in attached v6.
> 
> > The only issue I see with that is that we don't seem to have anywhere
> > in the regression tests that call heapgettup_no_movement() when
> > rs_flags have SO_ALLOW_PAGEMODE. At least, adding an elog(NOTICE) to
> > heapgettup() just before calling heapgettup_no_movement() does not
> > seem to cause make check to fail.  I wonder if any series of SQL
> > commands would allow us to call heapgettup_no_movement() from
> > heapgettup()?
> 
> So, the places in which we set scan direction to no movement include:
> - explain analyze on a ctas with no data
>   EXPLAIN ANALYZE CREATE TABLE foo AS SELECT 1 WITH NO DATA;
>   However, in standard_ExecutorRun() we only call ExecutePlan() if the
>   ScanDirection is not no movement, so this wouldn't hit our code
> - PortalRunSelect
> - PersistHoldablePortal()
> 
> I can't say I know enough about portals currently to design a test that
> will hit this code, but I will poke around some more.
 
I don't think we can write a test for this afterall. I've started
another thread on the topic over here:

https://www.postgresql.org/message-id/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY%3D_T8QEqZqOL02rpi2bw%40mail.gmail.com




heapgettup() with NoMovementScanDirection unused in core?

2023-01-24 Thread Melanie Plageman
Hi,

David Rowley and I were discussing how to test the
NoMovementScanDirection case for heapgettup() and heapgettup_pagemode()
in [1] (since there is not currently coverage). We are actually
wondering if it is dead code (in core).

This is a link to the code in question on github in [2] (side note: is
there a more evergreen way to do this that doesn't involve pasting a
hundred lines of code into this email? You need quite a few lines of
context for it to be clear what code I am talking about.)

standard_ExecutorRun() doesn't run ExecutePlan() if scan direction is no
movement.

if (!ScanDirectionIsNoMovement(direction))
{
...
ExecutePlan(estate,
queryDesc->planstate,
  }

And other users of heapgettup() through table_scan_getnextslot() and the
like all seem to pass ForwardScanDirection as the direction.

A skilled code archaeologist brought our attention to adbfab119b308a
which appears to remove the only users in the core codebase calling
heapgettup() and heapgettup_pagemode() with NoMovementScanDirection (and
those users were not themselves used).

Perhaps we can remove support for NoMovementScanDirection with
heapgettup()? Unless someone knows of a good use case for a table AM to
do this?

- Melanie

[1] 
https://www.postgresql.org/message-id/CAAKRu_ZyiXwWS1WXSZneoy%2BsjoH_%2BF5UhO-1uFhyi-u0d6z%2BfA%40mail.gmail.com
[2] 
https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c#L656




Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Jan 25, 2023 at 01:20:39PM +1300, Thomas Munro wrote:
>> Something to do with
>> https://www.postgresql.org/docs/current/kernel-resources.html#SYSTEMD-REMOVEIPC
>> ?

> Still this is unrelated?  This is a buildfarm instance, so the backend
> does not run with systemd.

That systemd behavior affects IPC resources regardless of what process
created them.

regards, tom lane




Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Michael Paquier
On Wed, Jan 25, 2023 at 01:20:39PM +1300, Thomas Munro wrote:
> Something to do with
> https://www.postgresql.org/docs/current/kernel-resources.html#SYSTEMD-REMOVEIPC
> ?

Still this is unrelated?  This is a buildfarm instance, so the backend
does not run with systemd.

> The failure I saw looked like a straight up case of the bug reported
> in this thread to me.

Okay...
--
Michael


signature.asc
Description: PGP signature


Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Thomas Munro
On Wed, Jan 25, 2023 at 1:02 PM Michael Paquier  wrote:
> Well, this host has a problem, for what looks like a kernel issue, I
> guess..  This is repeatable across all the branches, randomly, with
> various errors with the POSIX DSM implementation:
> # [63cf68b7.5e5a:1] ERROR:  could not open shared memory segment 
> "/PostgreSQL.543738922": No such file or directory
> # [63cf68b7.5e58:6] ERROR:  could not open shared memory segment 
> "/PostgreSQL.543738922": No such file or directory

Something to do with
https://www.postgresql.org/docs/current/kernel-resources.html#SYSTEMD-REMOVEIPC
?

The failure I saw looked like a straight up case of the bug reported
in this thread to me.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Kyotaro Horiguchi
At Tue, 24 Jan 2023 11:45:36 +0530, Amit Kapila  wrote 
in 
> Personally, I would prefer the above LOGs to be in reverse order as it
> doesn't make much sense to me to first say that we are skipping
> changes and then say the transaction is delayed. What do you think?

In the first place, I misunderstood maybe_start_skipping_changes(),
which doesn't actually skip changes. So... sorry for the noise.

For the record, I agree that the current order is right.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Generating code for query jumbling through gen_node_support.pl

2023-01-24 Thread Michael Paquier
On Tue, Jan 24, 2023 at 03:57:56PM +0900, Michael Paquier wrote:
> Makes sense.  That would be my intention if 0004 is the most
> acceptable and splitting things makes things a bit easier to review.

There was a silly mistake in 0004 where the jumbling code relied on
compute_query_id rather than utility_query_id, so fixed and rebased as
of v7 attached.
--
Michael
From e1a35c02927ed38b04b1f628c460ef05f706621f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 24 Jan 2023 15:29:06 +0900
Subject: [PATCH v7 3/4] Support for automated query jumble with all Nodes

This applies query jumbling in a consistent way to all the Nodes,
including DDLs & friends.
---
 src/include/nodes/bitmapset.h |   2 +-
 src/include/nodes/nodes.h |  13 +-
 src/include/nodes/parsenodes.h| 126 ++--
 src/include/nodes/primnodes.h | 268 
 src/backend/nodes/README  |   1 +
 src/backend/nodes/gen_node_support.pl | 114 +++-
 src/backend/nodes/meson.build |   2 +-
 src/backend/nodes/queryjumblefuncs.c  | 852 ++
 8 files changed, 519 insertions(+), 859 deletions(-)

diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 0dca6bc5fa..3d2225e1ae 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -50,7 +50,7 @@ typedef int32 signedbitmapword; /* must be the matching signed type */
 
 typedef struct Bitmapset
 {
-	pg_node_attr(custom_copy_equal, special_read_write)
+	pg_node_attr(custom_copy_equal, special_read_write, no_query_jumble)
 
 	NodeTag		type;
 	int			nwords;			/* number of words in array */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 10752e8011..d7a9e38436 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -53,16 +53,20 @@ typedef enum NodeTag
  * - custom_read_write: Has custom implementations in outfuncs.c and
  *   readfuncs.c.
  *
+ * - custom_query_jumble: Has custom implementation in queryjumblefuncs.c.
+ *
  * - no_copy: Does not support copyObject() at all.
  *
  * - no_equal: Does not support equal() at all.
  *
  * - no_copy_equal: Shorthand for both no_copy and no_equal.
  *
+ * - no_query_jumble: Does not support JumbleQuery() at all.
+ *
  * - no_read: Does not support nodeRead() at all.
  *
- * - nodetag_only: Does not support copyObject(), equal(), outNode(),
- *   or nodeRead().
+ * - nodetag_only: Does not support copyObject(), equal(), jumbleQuery()
+ *   outNode() or nodeRead().
  *
  * - special_read_write: Has special treatment in outNode() and nodeRead().
  *
@@ -97,6 +101,11 @@ typedef enum NodeTag
  * - equal_ignore_if_zero: Ignore the field for equality if it is zero.
  *   (Otherwise, compare normally.)
  *
+ * - query_jumble_ignore: Ignore the field for the query jumbling.
+ *
+ * - query_jumble_location: Mark the field as a location to track.  This is
+ *   only allowed for integer fields that include "location" in their name.
+ *
  * - read_as(VALUE): In nodeRead(), replace the field's value with VALUE.
  *
  * - read_write_ignore: Ignore the field for read/write.  This is only allowed
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 89335d95e7..f99fb5e909 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -116,6 +116,11 @@ typedef uint64 AclMode;			/* a bitmask of privilege bits */
  *
  *	  Planning converts a Query tree into a Plan tree headed by a PlannedStmt
  *	  node --- the Query structure is not used by the executor.
+ *
+ *	  All the fields ignored for the query jumbling are not semantically
+ *	  significant (such as alias names), as is ignored anything that can
+ *	  be deduced from child nodes (else we'd just be double-hashing that
+ *	  piece of information).
  */
 typedef struct Query
 {
@@ -124,45 +129,47 @@ typedef struct Query
 	CmdType		commandType;	/* select|insert|update|delete|merge|utility */
 
 	/* where did I come from? */
-	QuerySource querySource;
+	QuerySource querySource pg_node_attr(query_jumble_ignore);
 
 	/*
 	 * query identifier (can be set by plugins); ignored for equal, as it
-	 * might not be set; also not stored
+	 * might not be set; also not stored.  This is the result of the query
+	 * jumble, hence ignored.
 	 */
-	uint64		queryId pg_node_attr(equal_ignore, read_write_ignore, read_as(0));
+	uint64		queryId pg_node_attr(equal_ignore, query_jumble_ignore, read_write_ignore, read_as(0));
 
 	/* do I set the command result tag? */
-	bool		canSetTag;
+	bool		canSetTag pg_node_attr(query_jumble_ignore);
 
 	Node	   *utilityStmt;	/* non-null if commandType == CMD_UTILITY */
 
 	/*
 	 * rtable index of target relation for INSERT/UPDATE/DELETE/MERGE; 0 for
-	 * SELECT.
+	 * SELECT.  This is ignored in the query jumble as unrelated to the
+	 * compilation of the query ID.
 	 */
-	int			resultRelation;
+	int			resultRelation pg_node_attr(query_jumble_ignore);
 
 	/* has aggregates in tlist or havingQual */
-	

Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Michael Paquier
On Wed, Jan 25, 2023 at 12:40:02PM +1300, Thomas Munro wrote:
> I remembered this thread after seeing the failure of Michael's new
> build farm animal "tanager".  I think we need to solve this somehow...

Well, this host has a problem, for what looks like a kernel issue, I
guess..  This is repeatable across all the branches, randomly, with
various errors with the POSIX DSM implementation:
# [63cf68b7.5e5a:1] ERROR:  could not open shared memory segment 
"/PostgreSQL.543738922": No such file or directory
# [63cf68b7.5e58:6] ERROR:  could not open shared memory segment 
"/PostgreSQL.543738922": No such file or directory

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tanager=2023-01-24%2004%3A23%3A53
dynamic_shared_memory_type should be using posix in this case.
Switching to mmap may help, perhaps..  I need to test it.

Anyway, sorry for the noise on this one.
--
Michael


signature.asc
Description: PGP signature


Re: suppressing useless wakeups in logical/worker.c

2023-01-24 Thread Tom Lane
Nathan Bossart  writes:
> [ v2-0001-suppress-unnecessary-wakeups-in-logical-worker.c.patch ]

I took a look through this, and have a number of mostly-cosmetic
issues:

* It seems wrong that next_sync_start isn't handled as one of the
wakeup[NUM_LRW_WAKEUPS] entries.  I see that it needs to be accessed from
another module; but you could handle that without exposing either enum
LogRepWorkerWakeupReason or the array, by providing getter/setter
functions for process_syncing_tables_for_apply() to call.

* This code is far too intimately familiar with the fact that TimestampTz
is an int64 count of microseconds.  (I'm picky about that because I
remember that they were once something else, so I wonder if someday
they will be different again.)  You could get rid of the PG_INT64_MAX
usages by replacing those with the timestamp infinity macro DT_NOEND;
and I'd even be on board with adding a less-opaque alternate name for
that to datatype/timestamp.h.  The various magic-constant multipliers
could perhaps be made less magic by using TimestampTzPlusMilliseconds(). 

* I think it might be better to construct the enum like this:

+typedef enum LogRepWorkerWakeupReason
+{
+   LRW_WAKEUP_TERMINATE,
+   LRW_WAKEUP_PING,
+   LRW_WAKEUP_STATUS
+#define NUM_LRW_WAKEUPS (LRW_WAKEUP_STATUS + 1)
+} LogRepWorkerWakeupReason;

so that you don't have to have a default: case in switches on the
enum value.  I'm more worried about somebody adding an enum value
and missing updating a switch statement elsewhere than I am about 
somebody adding an enum value and neglecting to update the
immediately-adjacent macro.

* The updates of "now" in LogicalRepApplyLoop seem rather
randomly placed, and I'm not entirely convinced that we'll
always be using a reasonably up-to-date value.  Can't we
just update it right before each usage?

* This special handling of next_sync_start seems mighty ugly:

+/* Also consider special wakeup time for starting sync workers. */
+if (next_sync_start < now)
+{
+/*
+ * Instead of spinning while we wait for the sync worker to
+ * start, wait a bit before retrying (unless there's an earlier
+ * wakeup time).
+ */
+nextWakeup = Min(now + INT64CONST(10), nextWakeup);
+}
+else
+nextWakeup = Min(next_sync_start, nextWakeup);

Do we really need the slop?  If so, is there a reason why it
shouldn't apply to all possible sources of nextWakeup?  (It's
going to be hard to fold next_sync_start into the wakeup[]
array unless you can make this not a special case.)

* It'd probably be worth enlarging the comment for
LogRepWorkerComputeNextWakeup to explain why its API is like that,
perhaps "We ask the caller to pass in the value of "now" because
this frequently avoids multiple calls of GetCurrentTimestamp().
It had better be a reasonably-up-to-date value, though."

regards, tom lane




Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Thomas Munro
On Mon, Mar 8, 2021 at 9:32 PM Kyotaro Horiguchi
 wrote:
> At Sun, 07 Mar 2021 20:09:33 -0500, Tom Lane  wrote in
> > Thomas Munro  writes:
> > > Thanks!  I'm afraid I wouldn't get around to it for a few weeks, so if
> > > you have time, please do.  (I'm not sure if it's strictly necessary to
> > > log *this* xid, if a higher xid has already been logged, considering
> > > that the goal is just to avoid getting confused about an xid that is
> > > recycled after crash recovery, but coordinating that might be more
> > > complicated; I don't know.)
> >
> > Yeah, ideally the patch wouldn't add any unnecessary WAL flush,
> > if there's some cheap way to determine that our XID must already
> > have been written out.  But I'm not sure that it's worth adding
> > any great amount of complexity to avoid that.  For sure I would
> > not advocate adding any new bookkeeping overhead in the mainline
> > code paths to support it.
>
> We need to *write* an additional record if the current transaction
> haven't yet written one (EnsureTopTransactionIdLogged()). One
> annoyance is the possibly most-common usage of calling
> pg_current_xact_id() at the beginning of a transaction, which leads to
> an additional 8 byte-long log of XLOG_XACT_ASSIGNMENT. We could also
> avoid that by detecting any larger xid is already flushed out.

Yeah, that would be very expensive for users doing that.

> I haven't find a simple and clean way to tracking the maximum
> flushed-out XID.  The new cooperation between xlog.c and xact.c
> related to XID and LSN happen on shared variable makes things
> complex...
>
> So the attached doesn't contain the max-flushed-xid tracking feature.

I guess that would be just as expensive if the user does that
sequentially with small transactions (ie allocating xids one by one).

I remembered this thread after seeing the failure of Michael's new
build farm animal "tanager".  I think we need to solve this somehow...
according to our documentation "Applications might use this function,
for example, to determine whether their transaction committed or
aborted after the application and database server become disconnected
while a COMMIT is in progress.", but it's currently useless or
dangerous for that purpose.




Getting relations accessed by a query using the raw query string

2023-01-24 Thread Amin
Hi,

Having a query string, I am trying to use the postgres parser to find which
relations the query accesses. This is what I currently have:

const char *query_string="select * from dummytable;";
List *parsetree_list=pg_parse_query(query_string);
ListCell *parsetree_item;

foreach(parsetree_item,parsetree_list){
  RawStmt *parsetree=lfirst_node(RawStmt,parsetree_item);
  Query *query=parse_analyze(parsetree,query_string,NULL,0,NULL);
}

However, when I inspect the variable "query", it is not populated
correctly. For example, commandType is set to CMD_DELETE while I have
passed a SELECT query.
- What am I doing wrong?
- Once I get the query correctly, how can I get the list of relations it
gets access to?
- Or any other ways to get the list of relations from raw query string
through postgres calls?

Thank you!


Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-24 Thread Peter Smith
Here are my review comments for patch v87-0002.

==
doc/src/sgml/config.sgml

1.

-Allows streaming or serializing changes immediately in
logical decoding.
 The allowed values of logical_replication_mode are
-buffered and immediate. When set
-to immediate, stream each change if
+buffered and immediate.
The default
+is buffered.
+   

I didn't think it was necessary to say “of logical_replication_mode”.
IMO that much is already obvious because this is the first sentence of
the description for logical_replication_mode.

(see also review comment #4)

~~~

2.
+   
+On the publisher side, it allows streaming or serializing changes
+immediately in logical decoding.  When set to
+immediate, stream each change if
 streaming option (see optional parameters set by
 CREATE
SUBSCRIPTION)
 is enabled, otherwise, serialize each change.  When set to
-buffered, which is the default, decoding will stream
-or serialize changes when logical_decoding_work_mem
-is reached.
+buffered, decoding will stream or serialize changes
+when logical_decoding_work_mem is reached.


2a.
"it allows" --> "logical_replication_mode allows"

2b.
"decoding" --> "the decoding"

~~~

3.
+   
+On the subscriber side, if streaming option is set
+to parallel, this parameter also allows the leader
+apply worker to send changes to the shared memory queue or to serialize
+changes.  When set to buffered, the leader sends
+changes to parallel apply workers via shared memory queue.  When set to
+immediate, the leader serializes all changes to
+files and notifies the parallel apply workers to read and apply them at
+the end of the transaction.
+   

"this parameter also allows" --> "logical_replication_mode also allows"

~~~

4.

 This parameter is intended to be used to test logical decoding and
 replication of large transactions for which otherwise we need to
 generate the changes till logical_decoding_work_mem
-is reached.
+is reached. Moreover, this can also be used to test the transmission of
+changes between the leader and parallel apply workers.


"Moreover, this can also" --> "It can also"

I am wondering would this sentence be better put at the top of the GUC
description. So then the first paragraph becomes like this:


SUGGESTION (I've also added another sentence "The effect of...")

The allowed values are buffered and immediate. The default is
buffered. This parameter is intended to be used to test logical
decoding and replication of large transactions for which otherwise we
need to generate the changes till logical_decoding_work_mem is
reached. It can also be used to test the transmission of changes
between the leader and parallel apply workers. The effect of
logical_replication_mode is different for the publisher and
subscriber:

On the publisher side...

On the subscriber side...
==
.../replication/logical/applyparallelworker.c

5.
+ /*
+ * In immeidate mode, directly return false so that we can switch to
+ * PARTIAL_SERIALIZE mode and serialize remaining changes to files.
+ */
+ if (logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE)
+ return false;

Typo "immediate"

Also, I felt "directly" is not needed. "return false" and "directly
return false" is the same.

SUGGESTION
Using ‘immediate’ mode returns false to cause a switch to
PARTIAL_SERIALIZE mode so that the remaining changes will be
serialized.

==
src/backend/utils/misc/guc_tables.c

6.
  {
  {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS,
- gettext_noop("Allows streaming or serializing each change in logical
decoding."),
- NULL,
+ gettext_noop("Controls the behavior of logical replication publisher
and subscriber"),
+ gettext_noop("If set to immediate, on the publisher side, it "
+ "allows streaming or serializing each change in "
+ "logical decoding. On the subscriber side, in "
+ "parallel streaming mode, it allows the leader apply "
+ "worker to serialize changes to files and notifies "
+ "the parallel apply workers to read and apply them at "
+ "the end of the transaction."),
  GUC_NOT_IN_SAMPLE
  },

6a. short description

User PoV behaviour should be the same. Instead, maybe say "controls
the internal behavior" or something like that?

~

6b. long description

IMO the long description shouldn’t mention ‘immediate’ mode first as it does.

BEFORE
If set to immediate, on the publisher side, ...

AFTER
On the publisher side, ...

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Mutable CHECK constraints?

2023-01-24 Thread Laurenz Albe
On Tue, 2023-01-24 at 01:38 -0500, Tom Lane wrote:
> Laurenz Albe  writes:
> > We throw an error if the expression in a CREATE INDEX statement is not 
> > IMMUTABLE.
> > But while the documentation notes that expressions in CHECK constraints are 
> > not
> > to be immutable, we don't enforce that.  Why don't we call something like
> > CheckMutability inside cookConstraint?  Sure, that wouldn't catch all abuse,
> > but it would be better than nothing.
> 
> > There is of course the worry of breaking upgrade for unsafe constraints, 
> > but is
> > there any other reason not to enforce immutability?
> 
> Yeah, that's exactly it, it's a historical exemption for compatibility
> reasons.  There are discussions about this in the archives, if memory
> serves ... but I'm too tired to go digging.

Thanks for the answer.  A search turned up
https://postgr.es/m/AANLkTikwFfvavEX9nDwcRD4_xJb_VAitMeP1IH4wpGIt%40mail.gmail.com

Yours,
Laurenz Albe




Re: Update comments in multixact.c

2023-01-24 Thread Peter Geoghegan
On Wed, Jan 18, 2023 at 2:02 AM shiy.f...@fujitsu.com
 wrote:
> Thanks for your reply.
>
> Attach a patch which fixed them.

Pushed something close to that just now. I decided that it was better
to not specify when truncation happened in these two places at all,
though. The important detail is that it can happen if certain rules
are not followed.

Thanks
-- 
Peter Geoghegan




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-01-24 Thread Andres Freund
Hi,

On 2023-01-24 17:22:03 +0900, Kyotaro Horiguchi wrote:
> Hello.
> 
> At Thu, 19 Jan 2023 21:15:34 -0500, Melanie Plageman 
>  wrote in 
> > Oh dear-- an extra FlushBuffer() snuck in there somehow.
> > Removed it in attached v51.
> > Also, I fixed an issue in my tablespace.sql updates
> 
> I only looked 0002 and 0004.
> (Sorry for the random order of the comment..)
> 
> 0002:
> 
> + Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
> 
> This is relatively complex checking. We already asserts-out increments
> of invalid counters. Thus this is checking if some unrelated codes
> clobbered them, which we do only when consistency is critical. Is
> there any needs to do that here?  I saw another occurance of the same
> assertion.

I found it useful to find problems.


> + no_temp_rel = bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
> + bktype == B_CHECKPOINTER || bktype == B_AUTOVAC_WORKER ||
> + bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP;
> 
> I'm not sure I like to omit parentheses for such a long Boolean
> expression on the right side.

What parens would help?


> + write_chunk_s(fpout, );
> + if (!read_chunk_s(fpin, >io.stats))
> 
> The names of the functions hardly make sense alone to me. How about
> write_struct()/read_struct()?  (I personally prefer to use
> write_chunk() directly..)

That's not related to this patch - there's several existing callers for
it. And write_struct wouldn't be better imo, because it's not just for
structs.


> + PgStat_BktypeIO
> 
> This patch abbreviates "backend" as "bk" but "be" is used in many
> places. I think that naming should follow the predecessors.

The precedence aren't consistent unfortunately :)


> > +Number of read operations in units of op_bytes.
> 
> I may be the only one who see the name as umbiguous between "total
> number of handled bytes" and "bytes hadled at an operation". Can't it
> be op_blocksize or just block_size?
> 
> +   b.io_object,
> +   b.io_context,

No, block wouldn't be helpful - we'd like to use this for something that isn't
uniform blocks.

Greetings,

Andres Freund




Re: Non-superuser subscription owners

2023-01-24 Thread Robert Haas
On Mon, Jan 23, 2023 at 3:50 PM Jeff Davis  wrote:
> I believe your patch conflates two use cases:
>
> (A) Tightly-coupled servers that are managed by the administrator. In
> this case, there are a finite number of connection strings to make, and
> the admin knows about all of them. Validation is a poor solution for
> this use case, because we get into the weeds trying to figure out
> what's safe or not, overriding the admin's better judgement in some
> cases and letting through connection strings that might be unsafe. A
> much better solution is to simply declare the connection strings as
> some kind of object (perhaps a SERVER object), and hand out privileges
> or inherit them from a predefined role. Having connection string
> objects is also just a better UI: it allows changes to connection
> strings over time to adapt to changing security needs, and allows a
> simple name that is much easier to type and read.
>
> (B) Loosely-coupled servers that the admin doesn't know about, but
> which might be perfectly safe to access. Validation is useful here, but
> it's a long road of fine-grained privileges around acceptable hosts,
> IPs, authentication types, file access, password sources, password
> protocols, connection options, etc. The right solution here is to
> identify the sub-usecases of loosely-coupled servers, and enable them
> (with the appropriate controls) one at a time. Arguably, that's already
> what's happened by demanding a password (even if we don't like the
> mechanism, it does seem to work for some important cases).
>
> Is your patch targeted at use case (A), (B), or both?

I suppose that I would say that the patch is a better fit for (B),
because I'm not proposing to add any kind of intermediate object of
the type you postulate in (A). However, I don't really agree with the
way you've split this up, either. It seems to me that the relevant
question isn't "are the servers tightly coupled?" but rather "could
some user make a mess if we let them use any arbitrary connection
string?".

If you're running all of the machines involved on a private network
that is well-isolated from the Internet and in which only trusted
actors operate, you could use what I'm proposing here for either (A)
or (B) and it would be totally fine. If your server is sitting out on
the public Internet and is adequately secured against malicious
loopback connections, you could also probably use it for either (A) or
(B), unless you've got users who are really shady and you're worried
that the outbound connections that they make from your machine might
get you into trouble, in which case you probably can't use it for
either (A) or (B). Basically, the patch is suitable for cases where
you don't really need to restrict what connection strings people can
use, and unsuitable for cases where you do, but that doesn't have much
to do with whether the servers involved are loosely or tightly
coupled.

I think that you're basically trying to make an argument that some
sort of complex outbound connection filtering is mandatory, and I
still don't really agree with that. We ship postgres_fdw with
something extremely minimal - just a requirement that the password get
used - and the same for dblink. I think those rules suck and are
probably bad and insecure in quite a number of cases, and overly
strict in others, but I can think of no reason why CREATE SUBSCRIPTION
should be held to a higher standard than anything else. The
connections that you can make using CREATE SUBSCRIPTION are strictly
weaker than the ones you can make with dblink, which permits arbitrary
SQL execution. It cannot be right to suppose that a less-exploitable
system needs to be held to a higher security standard than a similar
but more-exploitable system.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-24 Thread Peter Smith
On Tue, Jan 24, 2023 at 11:49 PM houzj.f...@fujitsu.com
 wrote:
>
...
>
> Sorry, the patch set was somehow attached twice. Here is the correct new 
> version
> patch set which addressed all comments so far.
>

Here are my review comments for patch v87-0001.

==
src/backend/replication/logical/reorderbuffer.c

1.
@@ -210,7 +210,7 @@ int logical_decoding_work_mem;
 static const Size max_changes_in_memory = 4096; /* XXX for restore only */

 /* GUC variable */
-int logical_decoding_mode = LOGICAL_DECODING_MODE_BUFFERED;
+int logical_replication_mode = LOGICAL_REP_MODE_BUFFERED;


I noticed that the comment /* GUC variable */ is currently only above
the logical_replication_mode, but actually logical_decoding_work_mem
is a GUC variable too. Maybe this should be rearranged somehow then
change the comment "GUC variable" -> "GUC variables"?

==
src/backend/utils/misc/guc_tables.c

@@ -4908,13 +4908,13 @@ struct config_enum ConfigureNamesEnum[] =
  },

  {
- {"logical_decoding_mode", PGC_USERSET, DEVELOPER_OPTIONS,
+ {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS,
  gettext_noop("Allows streaming or serializing each change in logical
decoding."),
  NULL,
  GUC_NOT_IN_SAMPLE
  },
- _decoding_mode,
- LOGICAL_DECODING_MODE_BUFFERED, logical_decoding_mode_options,
+ _replication_mode,
+ LOGICAL_REP_MODE_BUFFERED, logical_replication_mode_options,
  NULL, NULL, NULL
  },

That gettext_noop string seems incorrect. I think Kuroda-san
previously reported the same, but then you replied it has been fixed
already [1]

> I felt the description seems not to be suitable for current behavior.
> A short description should be like "Sets a behavior of logical replication", 
> and
> further descriptions can be added in lond description.
I adjusted the description here.

But this doesn't look fixed to me. (??)

==
src/include/replication/reorderbuffer.h

3.
@@ -18,14 +18,14 @@
 #include "utils/timestamp.h"

 extern PGDLLIMPORT int logical_decoding_work_mem;
-extern PGDLLIMPORT int logical_decoding_mode;
+extern PGDLLIMPORT int logical_replication_mode;

Probably here should also be a comment to say "/* GUC variables */"

--
[1] 
https://www.postgresql.org/message-id/OS0PR01MB5716AE9F095F9E7888987BC794C99%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: heapgettup refactoring

2023-01-24 Thread Melanie Plageman
Thanks for taking a look!

On Mon, Jan 23, 2023 at 6:08 AM David Rowley  wrote:
>
> On Thu, 19 Jan 2023 at 00:04, Peter Eisentraut
>  wrote:
> > In your v2 patch, you remove these assertions:
> >
> > -   /* check that rs_cindex is in sync */
> > -   Assert(scan->rs_cindex < scan->rs_ntuples);
> > -   Assert(lineoff == scan->rs_vistuples[scan->rs_cindex]);
> >
> > Is that intentional?
> >
> > I don't see any explanation, or some other equivalent code appearing
> > elsewhere to replace this.
>
> I guess it's because those asserts are not relevant unless
> heapgettup_no_movement() is being called from heapgettup_pagemode().
> Maybe they can be put back along the lines of:
>
> Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 ||
> scan->rs_cindex < scan->rs_ntuples);
> Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 || lineoff ==
> scan->rs_vistuples[scan->rs_cindex]);
>
> but it probably would be cleaner to just do an: if
> (scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) { Assert(...);
> Assert(...}; }

I prefer the first method and have implemented that in attached v6.

> The only issue I see with that is that we don't seem to have anywhere
> in the regression tests that call heapgettup_no_movement() when
> rs_flags have SO_ALLOW_PAGEMODE. At least, adding an elog(NOTICE) to
> heapgettup() just before calling heapgettup_no_movement() does not
> seem to cause make check to fail.  I wonder if any series of SQL
> commands would allow us to call heapgettup_no_movement() from
> heapgettup()?

So, the places in which we set scan direction to no movement include:
- explain analyze on a ctas with no data
  EXPLAIN ANALYZE CREATE TABLE foo AS SELECT 1 WITH NO DATA;
  However, in standard_ExecutorRun() we only call ExecutePlan() if the
  ScanDirection is not no movement, so this wouldn't hit our code
- PortalRunSelect
- PersistHoldablePortal()

I can't say I know enough about portals currently to design a test that
will hit this code, but I will poke around some more.

> I think heapgettup_no_movement() also needs a header comment more
> along the lines of:
>
> /*
>  * heapgettup_no_movement
>  *Helper function for NoMovementScanDirection direction for
> heapgettup() and
>  *heapgettup_pagemode.
>  */

I've added a comment but I didn't include the function name in it -- I
find it repetitive when the comments above functions do that -- however,
I'm not strongly attached to that.

> I pushed the pgindent stuff that v5-0001 did along with some additions
> to typedefs.list so that further runs could be done more easily as
> changes are made to these patches.

Cool!

- Melanie
From 5d2e8cb3388d9fa2d122db7ca8d9319f56abcaf9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Tue, 10 Jan 2023 14:15:15 -0500
Subject: [PATCH v6 5/6] Add heapgettup_advance_block() helper

---
 src/backend/access/heap/heapam.c | 167 +--
 1 file changed, 72 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 08c4fb5228..15e4f3262b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -638,6 +638,74 @@ heapgettup_continue_page(HeapScanDesc scan, BlockNumber block, ScanDirection
 	return page;
 }
 
+static inline BlockNumber
+heapgettup_advance_block(HeapScanDesc scan, BlockNumber block, ScanDirection dir)
+{
+	if (ScanDirectionIsBackward(dir))
+	{
+		if (block == scan->rs_startblock)
+			return InvalidBlockNumber;
+
+		if (scan->rs_numblocks != InvalidBlockNumber)
+		{
+			if (--scan->rs_numblocks == 0)
+return InvalidBlockNumber;
+		}
+
+		if (block == 0)
+			block = scan->rs_nblocks;
+
+		block--;
+
+		return block;
+	}
+	else if (scan->rs_base.rs_parallel != NULL)
+	{
+		Assert(ScanDirectionIsForward(dir));
+
+		block = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
+  scan->rs_parallelworkerdata, (ParallelBlockTableScanDesc)
+  scan->rs_base.rs_parallel);
+
+		return block;
+	}
+	else
+	{
+		Assert(ScanDirectionIsForward(dir));
+
+		block++;
+
+		if (block >= scan->rs_nblocks)
+			block = 0;
+
+		if (block == scan->rs_startblock)
+			return InvalidBlockNumber;
+
+		if (scan->rs_numblocks != InvalidBlockNumber)
+		{
+			if (--scan->rs_numblocks == 0)
+return InvalidBlockNumber;
+		}
+
+		/*
+		 * Report our new scan position for synchronization purposes. We don't
+		 * do that when moving backwards, however. That would just mess up any
+		 * other forward-moving scanners.
+		 *
+		 * Note: we do this before checking for end of scan so that the final
+		 * state of the position hint is back at the start of the rel.  That's
+		 * not strictly necessary, but otherwise when you run the same query
+		 * multiple times the starting position would shift a little bit
+		 * backwards on every invocation, which is confusing. We don't
+		 * guarantee any specific ordering in general, though.
+		 */
+		if (scan->rs_base.rs_flags & SO_ALLOW_SYNC)
+	

Re: plpython vs _POSIX_C_SOURCE

2023-01-24 Thread Tom Lane
Andres Freund  writes:
> Python's _POSIX_C_SOURCE value is set to a specific value in their configure
> script:

> if test $define_xopen_source = yes
> then
>   ...
>   AC_DEFINE(_POSIX_C_SOURCE, 200809L, Define to activate features from IEEE 
> Stds 1003.1-2008)
> fi

Hm.  I looked into Python 3.2 (the oldest release we still support)
and it has similar code but

  AC_DEFINE(_POSIX_C_SOURCE, 200112L, Define to activate features from IEEE 
Stds 1003.1-2001)

So yeah it's fixed (or else not defined) for any particular Python
release, but could vary across releases.

> Solaris and AIX are the ones missing.
> I guess I'll test them manually. It seems promising not to need this stuff
> anymore?

Given that hoverfly is AIX, I'm betting there's an issue there.

>> Anyway, I'm still of the opinion that what a11cf433413 tried to do
>> was the best available fix, and we need to do whatever we have to do
>> to plpython's headers to reinstate that coding rule.

> You think it's not a viable path to just remove the _POSIX_C_SOURCE,
> _XOPEN_SOURCE undefines?

I think at the least that will result in warnings on some platforms,
and at the worst in actual build problems.  Maybe there are no more
of the latter a dozen years after the fact, but ...

>> That would be nice.  This old code was certainly mostly concerned with
>> python 2, maybe python 3 no longer does that?

> There's currently no non-comment references to *printf in their headers. The
> only past reference was removed in:
> commit e822e37946f27c09953bb5733acf3b07c2db690f
> Author: Victor Stinner 
> Date:   2020-06-15 21:59:47 +0200
> bpo-36020: Remove snprintf macro in pyerrors.h (GH-20889)

Oh, interesting.

> Which suggests an easier fix would be to just to do

> /*
>  * Python versions <= 3.8 otherwise define a replacement, causing
>  * macro redefinition warnings.
>  */
> #define HAVE_SNPRINTF

> And have that be enough for all python versions?

Nice idea.  We did not have that option while we were using HAVE_SNPRINTF
ourselves, but now that we are not I concur that this should work.
(I confirmed that their code looks the same in Python 3.2.)
Note that you'd better make it

#define HAVE_SNPRINTF 1

or you risk macro-redefinition warnings.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-01-24 Thread Andrew Dunstan


On 2023-01-24 Tu 13:42, Jelte Fennema wrote:
>> Here's another improvement I think will be useful when the new gadgets
>> are used in a git hook: first, look for the excludes file under the
>> current directory if we aren't setting $code_base (e.g if we have files
>> given on the command line), and second apply the exclude patterns to the
>> command line files as well as to files found using File::Find.
> Change looks good to me.


Thanks, pushed


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Making Vars outer-join aware

2023-01-24 Thread David G. Johnston
On Tue, Jan 24, 2023 at 1:25 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Tue, Jan 24, 2023 at 12:31 PM Tom Lane  wrote:
> >> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
> >>
> >> If we turn the generic equivclass.c logic loose on these clauses,
> >> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
> >> the scan of t2, which is even better (since we might be able to turn
> >> that into an indexscan qual).  However, it will also try to apply
> >> t1.x = 1 at the scan of t1, and that's just wrong, because that
> >> will eliminate t1 rows that should come through with null extension.
>
> > Is there a particular comment or README where that last conclusion is
> > explained so that it makes sense.
>
> Hm?  It's a LEFT JOIN, so it must not eliminate any rows from t1.
> A row that doesn't have t1.x = 1 will appear in the output with
> null columns for t2 ... but it must still appear, so we cannot
> filter on t1.x = 1 in the scan of t1.
>
>
Ran some queries, figured it out.  Sorry for the noise.  I had turned the
behavior of the RHS side appearing in the ON clause into a personal general
rule then tried to apply it to the LHS (left join mental model) without
working through the rules from first principles.

David J.


Re: Reducing power consumption on idle servers

2023-01-24 Thread Tom Lane
Thomas Munro  writes:
> Yeah, I definitely want to fix it.  I just worry that 60s is so long
> that it also needs that analysis work to be done to explain that it's
> OK that we're a bit sloppy on noticing when to wake up, at which point
> you might as well go to infinity.

Yeah.  The perfectionist in me wants to say that there should be
explicit wakeups for every event of interest, in which case there's
no need for a timeout.  The engineer in me says "but what about bugs?".
Better a slow reaction than never reacting at all.  OTOH, then you
have to have a discussion about whether 60s (or any other
ice-cap-friendly value) is an acceptable response time even in the
presence of bugs.

It's kind of moot until we've reached the point where we can
credibly claim to have explicit wakeups for every event of
interest.  I don't think we're very close to that today, and
I do think we should try to get closer.  There may come a point
of diminishing returns though.

regards, tom lane




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-24 Thread Peter Geoghegan
On Tue, Jan 24, 2023 at 11:21 AM Robert Haas  wrote:
> > The whole article was about how this DROP TRIGGER pattern worked just
> > fine most of the time, because most of the time autovacuum was just
> > autocancelled. They say this at one point:
> >
> > "The normal autovacuum mechanism is skipped when locks are held in
> > order to minimize service disruption. However, because transaction
> > wraparound is such a severe problem, if the system gets too close to
> > wraparound, an autovacuum is launched that does not back off under
> > lock contention."
>
> If this isn't arguing in favor of exactly what I'm saying, I don't
> know what that would look like.

I'm happy to clear that up. What you said was:

"So I think this sounds like exactly the kind of case I was talking about, where
autovacuums keep getting cancelled until we decide to stop cancelling them.
If so, then they were going to have a problem whenever that happened."

Just because *some* autovacuums get cancelled doesn't mean they *all*
get cancelled. And, even if the rate is quite high, that may not be
much of a problem in itself (especially now that we have the freeze
map). 200 million XIDs usually amounts to a lot of wall clock time.
Even if it is rather difficult to finish up, we only have to get lucky
once.

The fact that autovacuum eventually got to the point of requiring an
antiwraparound autovacuum on the problematic table does indeed
strongly suggest that any other, earlier autovacuums were relatively
unlikely to have advanced relfrozenxid in the end -- or at least
couldn't on this one occasion. But that in itself is just not relevant
to our current discussion, since even the tiniest perturbation would
have been enough to prevent a non-aggressive VACUUM from being able to
advance relfrozenxid. Before 15, non-aggressive VACUUMs would throw
away the opportunity to do so just because they couldn't immediately
get a cleanup lock on one single heap page.

It's quite possible that most or all prior aggressive VACUUMs were not
antiwraparound autovacuums, because the dead tuples accounting was
enough to launch an autovacuum at some point after age(relfrozenxid)
exceeded vacuum_freeze_table_age that was still before it could reach
autovacuum_freeze_max_age. That would give you a cancellable
aggressive VACUUM -- a VACUUM that actually has a non-zero chance of
advancing relfrozenxid.

Sure, it's possible that such a cancellable aggressive autovacuum was
indeed cancelled, and that that factor made the crucial difference.
But I find it far easier to believe that there simply was no such
aggressive autovacuum in the first place (not this time), since it
could have only happened when autovacuum thinks that there are
sufficiently many dead tuples to justify launching an autovacuum in
the first place. Which, as we now all accept, is based on highly
dubious sampling by ANALYZE. So I think it's much more likely to be
that factor (dead tuple accounting is bad), as well as the absurd
false dichotomy between aggressive and non-aggressive -- plus the
issue at hand, the auto-cancellation behavior.

I don't claim to know what is inevitable, or what is guaranteed to
work or not work. I only claim that we can meaningfully reduce the
absolute risk by using a fairly simple approach, principally by not
needlessly coupling the auto-cancellation behavior to *all*
autovacuums that are specifically triggered by age(relfrozenxid). As
Andres said at one point, doing those two things at exactly the same
time is just arbitrary.

--
Peter Geoghegan




Re: Making Vars outer-join aware

2023-01-24 Thread Tom Lane
"David G. Johnston"  writes:
> On Tue, Jan 24, 2023 at 12:31 PM Tom Lane  wrote:
>> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
>> 
>> If we turn the generic equivclass.c logic loose on these clauses,
>> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
>> the scan of t2, which is even better (since we might be able to turn
>> that into an indexscan qual).  However, it will also try to apply
>> t1.x = 1 at the scan of t1, and that's just wrong, because that
>> will eliminate t1 rows that should come through with null extension.

> Is there a particular comment or README where that last conclusion is
> explained so that it makes sense.

Hm?  It's a LEFT JOIN, so it must not eliminate any rows from t1.
A row that doesn't have t1.x = 1 will appear in the output with
null columns for t2 ... but it must still appear, so we cannot
filter on t1.x = 1 in the scan of t1.

regards, tom lane




postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-01-24 Thread Robert Haas
[ Changing subject line to something more appropriate: This is
branched from the "Non-superuser subscription owners" thread, but the
topic has become connection security more generally for outbound
connections from a PostgreSQL instance, the inadequacies of just
trying to require that such connections always use a password, and
related problems. I proposed some kind of "reverse pg_hba.conf file"
as a way of allowing configurable limits on such outbound connections.
]

On Tue, Jan 24, 2023 at 2:18 PM Jacob Champion  wrote:
> - It's completely reasonable to let a proxy operator restrict how that
> proxy is used. I doubt very much that a typical DBA wants to be
> operating an open proxy.

That's very well put. It's precisely what I was thinking, but
expressed much more clearly.

> - I think the devil will be in the details of the configuration
> design. Lists of allowed destination authorities (in the URI sense),
> options that must be present/absent/overridden, those sound great. But
> your initial examples of allow-loopback and require-passwords options
> are in the "make the DBA deal with it" line of thinking, IMO. I think
> it's difficult for someone to reason through those correctly the first
> time, even for experts. I'd like to instead see the core problem --
> that *any* ambient authentication used by a proxy is inherently risky
> -- exposed as a highly visible concept in the config, so that it's
> hard to make mistakes.

I find the concept of "ambient authentication" problematic. I don't
know exactly what you mean by it. I hope you'll tell me, but I think
that I won't like it even after I know, because as I said before, it's
difficult to know why anyone else makes a decision, and asking an
untrusted third-party why they're deciding something is sketchy at
best. I think that the problems we have in this area can be solved by
either (a) restricting the open proxy to be less open or (b)
encouraging people to authenticate users in some way that won't admit
connections from an open proxy. The former needs to be configurable by
the DBA, and the latter is also a configuration choice by the DBA. We
can provide tools here that make it less likely that people will shoot
themselves in the foot, and we can ship default configurations that
reduce the chance of inadvertent foot-shooting, and we can write
documentation that says "don't shoot yourself in the foot," but we
cannot actually prevent people from shooting themselves in the foot
except, perhaps, by massively nerfing the capabilities of the system.

What I was thinking about in terms of a "reverse pg_hba.conf" was
something in the vein of, e.g.:

SOURCE_COMPONENT SOURCE_DATABASE SOURCE_USER DESTINATION_SUBNET
DESTINATION_DATABASE DESTINATION_USER OPTIONS ACTION

e.g.

all all all local all all - deny # block access through UNIX sockets
all all all 127.0.0.0/8 all all - deny # block loopback interface via IPv4

Or:

postgres_fdw all all all all all authentication=cleartext,md5,sasl
allow # allow postgres_fdw with password-ish authentication

Disallowing loopback connections feels quite tricky. You could use
127.anything.anything.anything, but you could also loop back via IPv6,
or you could loop back via any interface. But you can't use
subnet-based ACLs to rule out loop backs through IP/IPv6 interfaces
unless you know what all your system's own IPs are. Maybe that's an
argument in favor of having a dedicated deny-loopback facility built
into the system instead of relying on IP ACLs. But I am not sure that
really works either: how sure are we that we can discover all of the
local IP addresses? Maybe it doesn't matter anyway, since the point is
just to disallow anything that would be likely to use "trust" or
"ident" authentication, and that's probably not going to include any
non-loopback network interfaces. But ... is that true in general? What
about on Windows?

> - I'm inherently skeptical of solutions that require all clients --
> proxies, in this case -- to be configured correctly in order for a
> server to be able to protect itself. (But I also have a larger
> appetite for security options that break compatibility when turned on.
> :D)

I (still) don't think that restricting the proxy is required, but you
can't both not restrict the proxy and also allow passwordless loopback
superuser connections. You have to pick one or the other. The reason I
keep harping on the role of the DBA is that I don't think we can make
that choice unilaterally on behalf of everyone. We've tried doing that
with the current rules and we've discussed the weaknesses of that
approach already.

> > I don't think the logon trigger thing is all *that* hypothetical. We
> > don't have it yet, but there have been patches proposed repeatedly for
> > many years.
>
> Okay. I think this thread has applicable lessons -- if connection
> establishment itself leads to side effects, all actors in the
> ecosystem (bouncers, proxies) have to be hardened against making those
> connections 

Re: Reducing power consumption on idle servers

2023-01-24 Thread Thomas Munro
On Wed, Nov 30, 2022 at 7:40 PM Simon Riggs
 wrote:
> On Wed, 30 Nov 2022 at 03:50, Thomas Munro  wrote:
> > I'm just curious, and not suggesting that 60s wakeups are a problem
> > for the polar ice caps, but why even time out at all?  Are the latch
> > protocols involved not reliable enough?  At a guess from a quick
> > glance, the walwriter's is but maybe the bgwriter could miss a wakeup
> > as it races against StrategyGetBuffer(), which means you might stay
> > asleep until the *next* buffer allocation, but that's already true I
> > think, and a 60s timeout is not much of a defence.
>
> That sounds reasonable.
>
> It does sound like we agree that the existing behavior of waking up
> every 5s or 2.5s is not good. I hope you will act to improve that.
>
> The approach taken in this patch, and others of mine, has been to
> offer a minimal change that achieves the objective of lengthy
> hibernation to save power.
>
> Removing the timeout entirely may not work in other circumstances I
> have not explored. Doing that requires someone to check it actually
> works, and for others to believe that check has occurred. For me, that
> is too time consuming to actually happen in this dev cycle, and time
> is part of the objective since perfect designs yet with unreleased
> code have no utility.
>
> 



Yeah, I definitely want to fix it.  I just worry that 60s is so long
that it also needs that analysis work to be done to explain that it's
OK that we're a bit sloppy on noticing when to wake up, at which point
you might as well go to infinity.




Re: Making Vars outer-join aware

2023-01-24 Thread David G. Johnston
On Tue, Jan 24, 2023 at 12:31 PM Tom Lane  wrote:

> I wrote:
> > Hans Buschmann  writes:
> >> I just noticed your new efforts in this area.
> >> I wanted to recurr to my old thread [1] considering constant
> propagation of quals.
> >> [1]
> https://www.postgresql.org/message-id/1571413123735.26...@nidsa.net
>
> > Yeah, this patch series is not yet quite up to the point of improving
> > that.  That area is indeed the very next thing I want to work on, and
> > I did spend some effort on it last month, but I ran out of time to get
> > it working.  Maybe we'll have something there for v17.
>
> BTW, to clarify what's going on there: what I want to do is allow
> the regular equivalence-class machinery to handle deductions from
> equality operators appearing in LEFT JOIN ON clauses (maybe full
> joins too, but I'd be satisfied if it works for one-sided outer
> joins).  I'd originally hoped that distinguishing pre-nulled from
> post-nulled variables would be enough to make that safe, but it's
> not.  Here's an example:
>
> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
>
> If we turn the generic equivclass.c logic loose on these clauses,
> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
> the scan of t2, which is even better (since we might be able to turn
> that into an indexscan qual).  However, it will also try to apply
> t1.x = 1 at the scan of t1, and that's just wrong, because that
> will eliminate t1 rows that should come through with null extension.
>
>
Is there a particular comment or README where that last conclusion is
explained so that it makes sense.  Intuitively, I would expect t1.x = 1 to
be applied during the scan of t1 - it isn't like the output of the join is
allowed to include t1 rows not matching that condition anyway.

IOW, I thought the more verbose but equivalent syntax for that was:

select ... from (select * from t1 as insub where insub.x = 1) as t1 left
join t2 on (t1.x  = t2.y)

Thanks!

David J.


Re: Making Vars outer-join aware

2023-01-24 Thread Tom Lane
I wrote:
> Hans Buschmann  writes:
>> I just noticed your new efforts in this area.
>> I wanted to recurr to my old thread [1] considering constant propagation of 
>> quals.
>> [1]  https://www.postgresql.org/message-id/1571413123735.26...@nidsa.net

> Yeah, this patch series is not yet quite up to the point of improving
> that.  That area is indeed the very next thing I want to work on, and
> I did spend some effort on it last month, but I ran out of time to get
> it working.  Maybe we'll have something there for v17.

BTW, to clarify what's going on there: what I want to do is allow
the regular equivalence-class machinery to handle deductions from
equality operators appearing in LEFT JOIN ON clauses (maybe full
joins too, but I'd be satisfied if it works for one-sided outer
joins).  I'd originally hoped that distinguishing pre-nulled from
post-nulled variables would be enough to make that safe, but it's
not.  Here's an example:

select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);

If we turn the generic equivclass.c logic loose on these clauses,
it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
the scan of t2, which is even better (since we might be able to turn
that into an indexscan qual).  However, it will also try to apply
t1.x = 1 at the scan of t1, and that's just wrong, because that
will eliminate t1 rows that should come through with null extension.

My current plan for making this work is to define
EquivalenceClass-generated clauses as applying within "join domains",
which are sets of inner-joined relations, and in the case of a one-sided
outer join then the join itself belongs to the same join domain as its
right-hand side --- but not to the join domain of its left-hand side.
This would allow us to push EC clauses from an outer join's qual down
into the RHS, but not into the LHS, and then anything leftover would
still have to be applied at the join.  In this example we'd have to
apply t1.x = t2.y or t1.x = 1, but not both, at the join.

I got as far as inventing join domains, in the 0012 patch of this
series, but I haven't quite finished puzzling out the clause application
rules that would be needed for this scenario.  Ordinarily an EC
containing a constant would be fully enforced at the scan level
(i.e., apply t1.x = 1 and t2.y = 1 at scan level) and generate no
additional clauses at join level; but that clearly doesn't work
anymore when some of the scans are outside the join domain.
I think that the no-constant case might need to be different too.
I have some WIP code but nothing I can show.

Also, this doesn't seem to help for full joins.  We can treat the
two sides as each being their own join domains, but then the join's
own ON clause doesn't belong to either one, since we can't throw
away rows from either side on the basis of a restriction from ON.
So it seems like we'll still need ad-hoc logic comparable to
reconsider_full_join_clause, if we want to preserve that optimization.
I'm only mildly discontented with that, but still discontented.

regards, tom lane




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-24 Thread Robert Haas
On Fri, Jan 20, 2023 at 4:24 PM Peter Geoghegan  wrote:
> > It sounds like they used DROP TRIGGER pretty regularly. So I think this
> > sounds like exactly the kind of case I was talking about, where
> > autovacuums keep getting cancelled until we decide to stop cancelling
> > them.
>
> I don't know how you can reach that conclusion.

I can accept that there might be some way I'm wrong about this in
theory, but your email then seems to go on to say that I'm right just
a few sentences later:

> The whole article was about how this DROP TRIGGER pattern worked just
> fine most of the time, because most of the time autovacuum was just
> autocancelled. They say this at one point:
>
> "The normal autovacuum mechanism is skipped when locks are held in
> order to minimize service disruption. However, because transaction
> wraparound is such a severe problem, if the system gets too close to
> wraparound, an autovacuum is launched that does not back off under
> lock contention."

If this isn't arguing in favor of exactly what I'm saying, I don't
know what that would look like.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Unicode grapheme clusters

2023-01-24 Thread Bruce Momjian
On Tue, Jan 24, 2023 at 11:40:01AM -0500, Greg Stark wrote:
> On Sat, 21 Jan 2023 at 13:17, Tom Lane  wrote:
> >
> > Probably our long-term answer is to avoid depending on wcwidth
> > and use wcswidth instead.  But it's hard to get excited about
> > doing the legwork for that until popular libc implementations
> > get it right.
> 
> Here's an interesting blog post about trying to do this in Rust:
> 
> https://tomdebruijn.com/posts/rust-string-length-width-calculations/
> 
> TL;DR... Even counting the number of graphemes isn't enough because
> terminals typically (but not always) display emoji graphemes using two
> columns.
> 
> At the end of the day Unicode kind of assumes a variable-width display
> where the rendering is handled by something that has access to the
> actual font metrics. So anything trying to line things up in columns
> in a way that works with any rendering system down the line using any
> font is going to be making a best guess.

Yes, good article, though I am still surprised this is not discussed
more often.  Anyway, for psql, we assume a fixed width output device, so
we can just assume that for computation.  You are right that Unicode
just doesn't seem to consider fixed width output cases and doesn't
provide much guidance.

Beyond psql, should we update our docs to say that character_length()
for Unicode returns the number of Unicode code points, and not
necessarily the number of displayed characters if grapheme clusters are
present?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: Non-superuser subscription owners

2023-01-24 Thread Jacob Champion
On Tue, Jan 24, 2023 at 5:50 AM Robert Haas  wrote:
> I think this has some potential, but it's pretty complex, seeming to
> require protocol extensions and having backward-compatibility problems
> and so on.

Yeah.

> What do you think about something in the spirit of a
> reverse-pg_hba.conf? The idea being that PostgreSQL facilities that
> make outbound connections are supposed to ask it whether those
> connections are OK to initiate.  Then you could have a default
> configuration that basically says "don't allow loopback connections"
> or "require passwords all the time" or whatever we like, and the DBA
> can change that as desired.

Well, I'll have to kick the idea around a little bit. Kneejerk reactions:

- It's completely reasonable to let a proxy operator restrict how that
proxy is used. I doubt very much that a typical DBA wants to be
operating an open proxy.

- I think the devil will be in the details of the configuration
design. Lists of allowed destination authorities (in the URI sense),
options that must be present/absent/overridden, those sound great. But
your initial examples of allow-loopback and require-passwords options
are in the "make the DBA deal with it" line of thinking, IMO. I think
it's difficult for someone to reason through those correctly the first
time, even for experts. I'd like to instead see the core problem --
that *any* ambient authentication used by a proxy is inherently risky
-- exposed as a highly visible concept in the config, so that it's
hard to make mistakes.

- I'm inherently skeptical of solutions that require all clients --
proxies, in this case -- to be configured correctly in order for a
server to be able to protect itself. (But I also have a larger
appetite for security options that break compatibility when turned on.
:D)

> > For the hypothetical logon trigger, or any case where the server does
> > something on behalf of a user upon connection, I agree it doesn't help you.
>
> I don't think the logon trigger thing is all *that* hypothetical. We
> don't have it yet, but there have been patches proposed repeatedly for
> many years.

Okay. I think this thread has applicable lessons -- if connection
establishment itself leads to side effects, all actors in the
ecosystem (bouncers, proxies) have to be hardened against making those
connections passively. I know we're very different from HTTP, but it
feels similar to their concept of method safety and the consequences
of violating it.

--Jacob




Re: HOT chain validation in verify_heapam()

2023-01-24 Thread Robert Haas
On Sun, Jan 22, 2023 at 10:19 AM Himanshu Upadhyaya
 wrote:
> I was trying to use lp_valid as I need to identify the root of the HOT chain 
> and we are doing validation on the root of the HOT chain when we loop over 
> the predecessor array.
> Was resetting lp_valid in the last patch because we don't add data to 
> predecessor[] and while looping over the predecessor array we need to isolate 
> (and identify) all cases of missing data in the predecessor array to exactly 
> identify the root of HOT chain.
> One solution is to always add data to predecessor array while looping over 
> successor array and then while looping over predecessor array we can continue 
> for other validation "if (lp_valid [predecessor[currentoffnum]] && 
> lp_valid[currentoffnum]" is true but in this case also our third loop will 
> also look at lp_valid[].

I don't mind if the third loop looks at lp_valid if it has a reason to
do that, but I don't think we should be resetting values from true to
false. Once we know a line pointer to be valid, it doesn't stop being
valid later because we found out some other thing about something
else.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: plpython vs _POSIX_C_SOURCE

2023-01-24 Thread Andres Freund
Hi,

On 2023-01-24 12:55:15 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > The background for the undefines is that _POSIX_C_SOURCE needs to be defined
> > the same for the whole compilation, not change in the middle, and Python.h
> > defines it. To protect "our" parts a11cf433413 instituted the rule that all
> > postgres headers have to be included first. But then that promptly got 
> > broken
> > in 147c2482542.
> 
> > But apparently the breakage in 147c2482542 was partial enough that we didn't
> > run into obvious trouble so far (although I wonder if some of the linkage
> > issues we had in the past with plpython could be related).
> 
> I found the discussion thread that led up to a11cf433413:
> 
> https://www.postgresql.org/message-id/flat/4DB3B546.9080508%40dunslane.net
> 
> What we originally set out to fix, AFAICS, was compiler warnings about
> _POSIX_C_SOURCE getting redefined with a different value.  I think that'd
> only happen if pyconfig.h had originally been constructed on a machine
> where _POSIX_C_SOURCE was different from what prevails in a Postgres
> build.

Python's _POSIX_C_SOURCE value is set to a specific value in their configure
script:

if test $define_xopen_source = yes
then
  # X/Open 7, incorporating POSIX.1-2008
  AC_DEFINE(_XOPEN_SOURCE, 700,
Define to the level of X/Open that your system supports)

  # On Tru64 Unix 4.0F, defining _XOPEN_SOURCE also requires
  # definition of _XOPEN_SOURCE_EXTENDED and _POSIX_C_SOURCE, or else
  # several APIs are not declared. Since this is also needed in some
  # cases for HP-UX, we define it globally.
  AC_DEFINE(_XOPEN_SOURCE_EXTENDED, 1,
Define to activate Unix95-and-earlier features)
 
  AC_DEFINE(_POSIX_C_SOURCE, 200809L, Define to activate features from IEEE 
Stds 1003.1-2008)
fi

So the concrete values don't depend on the environment (but whether they are
set does, sunos, hpux as well as a bunch of obsolete OS versions don't). But I
somehow doubt we'll see a different _POSIX_C_SOURCE value coming up.


> So I wouldn't see this warning, and I venture that you'd never see
> it on any other Linux/glibc platform either.

Yea, it works just fine on linux without it, I tried that already.


In fact, removing the _POSIX_C_SOURCE stuff build without additional warnings 
(*)
on freebsd, netbsd, linux (centos 7, fedora rawhide, debian bullseye, sid),
macOS, openbsd, windows with msvc and mingw.

Those I can test automatedly with the extended set of CI OSs:
https://cirrus-ci.com/build/4853456020701184

Some of the OSs are still running tests, but I doubt there's a runtime issue.


Solaris and AIX are the ones missing.

I guess I'll test them manually. It seems promising not to need this stuff
anymore?


(*) I see one existing plpython related warning on netbsd 9:
[18:49:12.710] ld: warning: libintl.so.1, needed by 
/usr/pkg/lib/libpython3.9.so, may conflict with libintl.so.8

But that's not related to this change. I assume it's an issue with one side
using libintl from /usr/lib and the other from /usr/pkg/lib.


> The 2011 thread started with concerns about Windows, where it's a lot easier
> to believe that there might be mismatched build environments.  But maybe
> nobody's actually set up a Windows box with that particular problem since
> 2011.

The native python doesn't have the issue, the windows pyconfig.h doesn't
define _POSIX_SOURCE et al. It's possible that there's a problem with older
mingw versions though - but I don't really care about old mingw versions, tbh.


> Whether inconsistency in _POSIX_C_SOURCE could lead to worse problems
> than a compiler warning isn't entirely clear to me, but it certainly
> seems possible.

I'm sure it can lead to compiler errors, there's IIRC some struct members only
defined for certain values.


> Anyway, I'm still of the opinion that what a11cf433413 tried to do
> was the best available fix, and we need to do whatever we have to do
> to plpython's headers to reinstate that coding rule.

You think it's not a viable path to just remove the _POSIX_C_SOURCE,
_XOPEN_SOURCE undefines?


> > The most minimal fix I can see is to institute the rule that no plpy_*.h
> > header can include a postgres header other than plpython.h.
> 
> Doesn't sound *too* awful.

It's not too bad to make the change, I'm less hopeful about it staying
fixed. I can't think of a reasonable way to make violations of the rule error
out.


> > Or we could see what breaks if we just don't care about _POSIX_C_SOURCE -
> > evidently we haven't succeeded in making this work for a long time.
> 
> Well, hoverfly is broken right now ...

What I mean is that we haven't handled the _POSIX_C_SOURCE stuff correctly for
a long time and the only problem that became apparent is hoverfly's issue, and
that's a problem of undefining _POSIX_C_SOURCE, rather than it being
redefined.



> >  * Sometimes python carefully scribbles on our *printf macros.
> >  * So we undefine them here and redefine them after it's done its dirty 

Re: run pgindent on a regular basis / scripted manner

2023-01-24 Thread Bruce Momjian
On Tue, Jan 24, 2023 at 09:54:57AM -0500, Tom Lane wrote:
> As another example, the mechanisms we use to create the typedefs list
> in the first place are pretty squishy/leaky: they depend on which
> buildfarm animals are running the typedef-generation step, and on
> whether anything's broken lately in that code --- which happens on
> a fairly regular basis (eg [1]).  Maybe that could be improved,
> but I don't see an easy way to capture the set of system-defined
> typedefs that are in use on platforms other than your own.  I
> definitely do not want to go over to hand maintenance of that list.

As I now understand it, we would need to standardize on a typedef list
at the beginning of each major development cycle, and then only allow
additions, and the addition would have to include any pgindent affects
of the addition.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: run pgindent on a regular basis / scripted manner

2023-01-24 Thread Jelte Fennema
> I think that would be undesirable, because then reindentation noise
> in completely-unrelated files would get baked into feature commits,
> complicating review and messing up "git blame" history.

With a rebase hook similar to the the pre-commit hook that I shared
upthread, your files will be changed accordingly, but you don't need
to commit those changes in the same commit as the one that you're
rebasing. You could append another commit after it. Another option
would be to move the typedefs.list change to a separate commit
together with all project wide indentation changes.

> The real bottom line here is that AFAICT, there are fewer committers
> who care about indent cleanliness than committers who do not, so
> I do not think that the former group get to impose strict rules
> on the latter, much as I might wish otherwise.

Is this actually the case? I haven't seen anyone in this thread say
they don't care. From my perspective it seems like the unclean
indents simply come from forgetting to run pgindent once in
a while. And those few forgetful moments add up over a year.
of commits. That's why to me tooling seems the answer here.
If the tooling makes it easy not to forget then the problem
goes away.

> FWIW, Andrew's recent --show-diff feature for pgindent has
> already improved my workflow for that.  I can do
> "pgindent --show-diff >fixindent.patch", manually remove any hunks
> in fixindent.patch that don't pertain to the code I'm working on,
> and apply what remains to fix up my new code.  (I had been doing
> something basically like this, but with more file-copying steps
> to undo pgindent's edit-in-place behavior.)

Yeah, I have a similar workflow with the pre-commit hook that
I shared. By using "git checkout -p" I can remove hunks that
don't pertain to my code. Still it would be really nice not
to have to go through that effort (which is significant for the
libpq code that I've been workin on, since there's ~50
incorrectly indented hunks).

> Here's another improvement I think will be useful when the new gadgets
> are used in a git hook: first, look for the excludes file under the
> current directory if we aren't setting $code_base (e.g if we have files
> given on the command line), and second apply the exclude patterns to the
> command line files as well as to files found using File::Find.

Change looks good to me.




Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-24 Thread Nathan Bossart
On Tue, Jan 24, 2023 at 01:13:55PM -0500, Tom Lane wrote:
> Either that comment needs to be rewritten or we need to invent some
> more macros.

Here is a first attempt at a patch.  I scanned through all the existing
uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
else that needed adjusting.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 564bffe5ca..970d170e73 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -922,8 +922,8 @@ ApplyLauncherShmemInit(void)
 
 		memset(LogicalRepCtx, 0, ApplyLauncherShmemSize());
 
-		LogicalRepCtx->last_start_dsa = DSM_HANDLE_INVALID;
-		LogicalRepCtx->last_start_dsh = DSM_HANDLE_INVALID;
+		LogicalRepCtx->last_start_dsa = DSA_HANDLE_INVALID;
+		LogicalRepCtx->last_start_dsh = DSHASH_HANDLE_INVALID;
 
 		/* Initialize memory and spin locks for each worker slot. */
 		for (slot = 0; slot < max_logical_replication_workers; slot++)
@@ -947,7 +947,7 @@ logicalrep_launcher_attach_dshmem(void)
 	MemoryContext oldcontext;
 
 	/* Quick exit if we already did this. */
-	if (LogicalRepCtx->last_start_dsh != DSM_HANDLE_INVALID &&
+	if (LogicalRepCtx->last_start_dsh != DSHASH_HANDLE_INVALID &&
 		last_start_times != NULL)
 		return;
 
@@ -957,7 +957,7 @@ logicalrep_launcher_attach_dshmem(void)
 	/* Be sure any local memory allocated by DSA routines is persistent. */
 	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
 
-	if (LogicalRepCtx->last_start_dsh == DSM_HANDLE_INVALID)
+	if (LogicalRepCtx->last_start_dsh == DSHASH_HANDLE_INVALID)
 	{
 		/* Initialize dynamic shared hash table for last-start times. */
 		last_start_times_dsa = dsa_create(LWTRANCHE_LAUNCHER_DSA);
diff --git a/src/include/lib/dshash.h b/src/include/lib/dshash.h
index 152927742e..c284c8489c 100644
--- a/src/include/lib/dshash.h
+++ b/src/include/lib/dshash.h
@@ -23,6 +23,9 @@ typedef struct dshash_table dshash_table;
 /* A handle for a dshash_table which can be shared with other processes. */
 typedef dsa_pointer dshash_table_handle;
 
+/* Special value for an unitinitialized dshash_table_handle */
+#define DSHASH_HANDLE_INVALID ((dshash_table_handle) InvalidDsaPointer)
+
 /* The type for hash values. */
 typedef uint32 dshash_hash;
 
diff --git a/src/include/utils/dsa.h b/src/include/utils/dsa.h
index 104386e674..ee59a76447 100644
--- a/src/include/utils/dsa.h
+++ b/src/include/utils/dsa.h
@@ -99,6 +99,9 @@ typedef pg_atomic_uint64 dsa_pointer_atomic;
  */
 typedef dsm_handle dsa_handle;
 
+/* Special value for an unitinitialized dsa_handle */
+#define DSA_HANDLE_INVALID ((dsa_handle) DSM_HANDLE_INVALID)
+
 extern dsa_area *dsa_create(int tranche_id);
 extern dsa_area *dsa_create_in_place(void *place, size_t size,
 	 int tranche_id, dsm_segment *segment);


Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-24 Thread Alvaro Herrera
Looking again, I have two thoughts for making things easier:

1. I don't think wait_for_write_catchup is necessary, because
calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
would already do the same thing.  So what we should do is patch places
that currently give those two arguments, so that they don't.

2. Because wait_for_replay_catchup is an instance method, passing the
second node as argument is needlessly noisy, because that's already
known as $self.  So we can just say

  $primary_node->wait_for_replay_catchup($standby_node);

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-24 Thread Tom Lane
Nathan Bossart  writes:
> IMO ideally there should be a DSA_HANDLE_INVALID and DSHASH_HANDLE_INVALID
> for use with dsa_handle and dshash_table_handle, respectively.  But your
> patch does seem like an improvement.

Yeah, particularly given that dsa.h says

/*
 * The handle for a dsa_area is currently implemented as the dsm_handle
 * for the first DSM segment backing this dynamic storage area, but client
 * code shouldn't assume that is true.
 */
typedef dsm_handle dsa_handle;

but then provides no way for client code to not be aware that a
dsa_handle is a dsm_handle, if it needs to deal with "invalid" values.
Either that comment needs to be rewritten or we need to invent some
more macros.

I agree that the patch as given is an improvement on what was
committed, but I wonder whether we shouldn't work a little harder
on cleaning this up more widely.

regards, tom lane




Re: plpython vs _POSIX_C_SOURCE

2023-01-24 Thread Tom Lane
Andres Freund  writes:
> The background for the undefines is that _POSIX_C_SOURCE needs to be defined
> the same for the whole compilation, not change in the middle, and Python.h
> defines it. To protect "our" parts a11cf433413 instituted the rule that all
> postgres headers have to be included first. But then that promptly got broken
> in 147c2482542.

> But apparently the breakage in 147c2482542 was partial enough that we didn't
> run into obvious trouble so far (although I wonder if some of the linkage
> issues we had in the past with plpython could be related).

I found the discussion thread that led up to a11cf433413:

https://www.postgresql.org/message-id/flat/4DB3B546.9080508%40dunslane.net

What we originally set out to fix, AFAICS, was compiler warnings about
_POSIX_C_SOURCE getting redefined with a different value.  I think that'd
only happen if pyconfig.h had originally been constructed on a machine
where _POSIX_C_SOURCE was different from what prevails in a Postgres
build.  On my RHEL8 box, I see that /usr/include/python3.6m/pyconfig-64.h
unconditionally does

#define _POSIX_C_SOURCE 200809L

while /usr/include/features.h can set a few different values, but the
one that would always prevail for us is

#ifdef _GNU_SOURCE
...
# undef  _POSIX_C_SOURCE
# define _POSIX_C_SOURCE200809L

So I wouldn't see this warning, and I venture that you'd never see
it on any other Linux/glibc platform either.  The 2011 thread started
with concerns about Windows, where it's a lot easier to believe that
there might be mismatched build environments.  But maybe nobody's
actually set up a Windows box with that particular problem since 2011.

Whether inconsistency in _POSIX_C_SOURCE could lead to worse problems
than a compiler warning isn't entirely clear to me, but it certainly
seems possible.

Anyway, I'm still of the opinion that what a11cf433413 tried to do
was the best available fix, and we need to do whatever we have to do
to plpython's headers to reinstate that coding rule.

> The most minimal fix I can see is to institute the rule that no plpy_*.h
> header can include a postgres header other than plpython.h.

Doesn't sound *too* awful.

> Or we could see what breaks if we just don't care about _POSIX_C_SOURCE -
> evidently we haven't succeeded in making this work for a long time.

Well, hoverfly is broken right now ...

>  * Sometimes python carefully scribbles on our *printf macros.
>  * So we undefine them here and redefine them after it's done its dirty deed.

> I didn't find code in recent-ish python to do that. Perhaps we should try to
> get away with not doing that?

That would be nice.  This old code was certainly mostly concerned with
python 2, maybe python 3 no longer does that?  (Unfortunately, the
_POSIX_C_SOURCE business is clearly still there in current python.)

regards, tom lane




Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-24 Thread Ankit Kumar Pandey




I think more benchmarking is required
so we can figure out if this is a corner case or a common case


I did some more benchmarks:

#1. AIM: Pushdown column whose size is very high

create table test(a int, b int, c text);
insert into test select a,b,c from generate_series(1,1000)a, 
generate_series(1,1000)b, repeat(md5(random()::text), 999)c;

explain (analyze, costs off) select count(*) over (order by a), row_number() 
over (order by a, b) from test order by a,b,c;
QUERY PLAN
--
 Incremental Sort (actual time=1161.605..6577.141 rows=100 loops=1)
   Sort Key: a, b, c
   Presorted Key: a, b
   Full-sort Groups: 31250  Sort Method: quicksort  Average Memory: 39kB  Peak 
Memory: 39kB
   ->  WindowAgg (actual time=1158.896..5819.460 rows=100 loops=1)
 ->  WindowAgg (actual time=1154.614..3391.537 rows=100 loops=1)
   ->  Gather Merge (actual time=1154.602..2404.125 rows=100 
loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort (actual time=1118.326..1295.743 rows=33 
loops=3)
   Sort Key: a, b
   Sort Method: external merge  Disk: 145648kB
   Worker 0:  Sort Method: external merge  Disk: 
140608kB
   Worker 1:  Sort Method: external merge  Disk: 
132792kB
   ->  Parallel Seq Scan on test (actual 
time=0.018..169.319 rows=33 loops=3)
 Planning Time: 0.091 ms
 Execution Time: 6816.616 ms
(17 rows)

Planner choose faster path correctly (which was not path which had pushed down 
column).

#2. AIM: Check strict vs incremental sorts wrt to large size data
Patch version is faster as for external merge sort, disk IO is main bottleneck 
and if we sort an extra column,
it doesn't have major impact. This is when work mem is very small.

For larger work_mem, difference between patched version and master is minimal 
and
they both provide somewhat comparable performance.

Tried permutation of few cases which we have already covered but I did not see 
anything alarming in those.



I'm just unsure if we should write this off as the expected behaviour



of Sort and continue with the patch or delay the whole thing until we


make some improvements to sort.  


I am not seeing other cases where patch version is consistently slower.


Thanks,
Ankit






Re: Non-superuser subscription owners

2023-01-24 Thread Andrew Dunstan


On 2023-01-24 Tu 08:50, Robert Haas wrote:
>
> What do you think about something in the spirit of a
> reverse-pg_hba.conf? The idea being that PostgreSQL facilities that
> make outbound connections are supposed to ask it whether those
> connections are OK to initiate.  Then you could have a default
> configuration that basically says "don't allow loopback connections"
> or "require passwords all the time" or whatever we like, and the DBA
> can change that as desired. We could teach dblink, postgres_fdw, and
> CREATE SUBSCRIPTION to use this new thing, and third-party code could
> adopt it if it likes.
>

I kinda like this idea, especially if we could specify the context that
rules are to apply in. e.g. postgres_fdw, mysql_fdw etc. I'd certainly
give it an outing in the redis_fdw if appropriate.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: psql: Add role's membership options to the \du+ command

2023-01-24 Thread David G. Johnston
On Mon, Jan 9, 2023 at 9:09 AM Pavel Luzanov 
wrote:

> When you include one role in another, you can specify three options:
> ADMIN, INHERIT (added in e3ce2de0) and SET (3d14e171).
>
> For example.
>
> CREATE ROLE alice LOGIN;
>
> GRANT pg_read_all_settings TO alice WITH ADMIN TRUE, INHERIT TRUE, SET
> TRUE;
> GRANT pg_stat_scan_tables TO alice WITH ADMIN FALSE, INHERIT FALSE, SET
> FALSE;
> GRANT pg_read_all_stats TO alice WITH ADMIN FALSE, INHERIT TRUE, SET FALSE;
>
> For information about the options, you need to look in the pg_auth_members:
>
> SELECT roleid::regrole, admin_option, inherit_option, set_option
> FROM pg_auth_members
> WHERE member = 'alice'::regrole;
>  roleid| admin_option | inherit_option | set_option
> --+--++
>   pg_read_all_settings | t| t  | t
>   pg_stat_scan_tables  | f| f  | f
>   pg_read_all_stats| f| t  | f
> (3 rows)
>
> I think it would be useful to be able to get this information with a
> psql command
> like \du (and \dg). With proposed patch the \du command still only lists
> the roles of which alice is a member:
>
> \du alice
>   List of roles
>   Role name | Attributes |  Member of
>
> ---++--
>   alice ||
> {pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables}
>
> But the \du+ command adds information about the selected ADMIN, INHERIT
> and SET options:
>
> \du+ alice
>  List of roles
>   Role name | Attributes |   Member of
> | Description
>
> ---++---+-
>   alice || pg_read_all_settings WITH ADMIN, INHERIT, SET+|
> || pg_read_all_stats WITH INHERIT   +|
> || pg_stat_scan_tables   |
>
> One more change. The roles in the "Member of" column are sorted for both
> \du+ and \du for consistent output.
>
> Any comments are welcome.
>
>
Yeah, I noticed the lack too, then went a bit too far afield with trying to
compose a graph of the roles.  I'm still working on that but at this point
it probably won't be something I try to get committed to psql.  Something
more limited like this does need to be included.

One thing I did was name the situation where none of the grants are true -
EMPTY.  So: pg_stat_scan_tables WITH EMPTY.

I'm not too keen on the idea of converting the existing array into a
newline separated string.  I would try hard to make the modification here
purely additional.  If users really want to build up queries on their own
they should be using the system catalog.  So concise human readability
should be the goal here.  Keeping those two things in mind I would add a
new text[] column to the views with the following possible values in each
cell the meaning of which should be self-evident or this probably isn't a
good approach...

ais
ai
as
a
is
i
s
empty

That said, I do find the newline based output to be quite useful in the
graph query I'm writing and so wouldn't be disappointed if we changed over
to that.  I'd probably stick with abbreviations though.  Another thing I
did with the graph was have both "member" and "memberof" columns in the
output.  In short, every grant row in pg_auth_members appears twice, once
in each column, so the role being granted membership and the role into
which membership is granted both have visibility when you filter on them.
For the role graph I took this idea and extended out to an entire chain of
roles (and also broke out user and group separately) but I think doing the
direct-grant only here would still be a big improvement.

postgres=# \dgS+ pg_read_all_settings
 List of roles
  Role name   |  Attributes  | Member of | Members | Description
--+--+---+-
 pg_read_all_settings | Cannot login | {}| { pg_monitor }   |

David J.


Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-24 Thread Nathan Bossart
On Tue, Jan 24, 2023 at 02:55:07AM +, houzj.f...@fujitsu.com wrote:
> I noticed one minor thing in this commit. 
> 
> -
> LogicalRepCtx->last_start_dsh = DSM_HANDLE_INVALID;
> -
> 
> The code takes the last_start_dsh as dsm_handle, but it seems it is a 
> dsa_pointer.
> " typedef dsa_pointer dshash_table_handle;" This won’t cause any problem, but 
> I feel
> It would be easier to understand if we take it as dsa_pointer and use 
> InvalidDsaPointer here,
> like what he attached patch does. What do you think ?

IMO ideally there should be a DSA_HANDLE_INVALID and DSHASH_HANDLE_INVALID
for use with dsa_handle and dshash_table_handle, respectively.  But your
patch does seem like an improvement.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: run pgindent on a regular basis / scripted manner

2023-01-24 Thread Andrew Dunstan

On 2023-01-24 Tu 11:43, Tom Lane wrote:
> Jelte Fennema  writes:
>> Sounds like this conflict could be handled fairly easily by
>> having a local git hook rerunning pgindent whenever
>> you rebase a commit:
>> 1. if you changed typedefs.list the hook would format all files
>> 2. if you didn't it only formats the files that you changed
> I think that would be undesirable, because then reindentation noise
> in completely-unrelated files would get baked into feature commits,
> complicating review and messing up "git blame" history.
> The approach we currently have allows reindent effects to be
> separated into ignorable labeled commits, which is a nice property.
>
>> Merge failures are one issue. But personally the main benefit that
>> I would be getting is being able to run pgindent on the files
>> I'm editing and get this weird +12 columns formatting correct
>> without having to manually type it. Without pgindent also
>> changing random parts of the files that someone else touched
>> a few commits before me.
> Yeah, that always annoys me too, but I've always considered that
> it's my problem not something I can externalize onto other people.
> The real bottom line here is that AFAICT, there are fewer committers
> who care about indent cleanliness than committers who do not, so
> I do not think that the former group get to impose strict rules
> on the latter, much as I might wish otherwise.
>
> FWIW, Andrew's recent --show-diff feature for pgindent has
> already improved my workflow for that.  I can do
> "pgindent --show-diff >fixindent.patch", manually remove any hunks
> in fixindent.patch that don't pertain to the code I'm working on,
> and apply what remains to fix up my new code.  (I had been doing
> something basically like this, but with more file-copying steps
> to undo pgindent's edit-in-place behavior.)
>
>   


I'm glad it's helpful.

Here's another improvement I think will be useful when the new gadgets
are used in a git hook: first, look for the excludes file under the
current directory if we aren't setting $code_base (e.g if we have files
given on the command line), and second apply the exclude patterns to the
command line files as well as to files found using File::Find.

I propose to apply this fairly soon.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 5eff1f8032..1f95a1a34e 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -63,6 +63,10 @@ $code_base ||= '.' unless @ARGV;
 $excludes ||= "$code_base/src/tools/pgindent/exclude_file_patterns"
   if $code_base && -f "$code_base/src/tools/pgindent/exclude_file_patterns";
 
+# also look under the current directory for the exclude patterns file
+$excludes ||= "src/tools/pgindent/exclude_file_patterns"
+  if -f "src/tools/pgindent/exclude_file_patterns";
+
 # The typedef list that's mechanically extracted by the buildfarm may omit
 # some names we want to treat like typedefs, e.g. "bool" (which is a macro
 # according to ), and may include some names we don't want
@@ -421,8 +425,6 @@ File::Find::find(
 	},
 	$code_base) if $code_base;
 
-process_exclude();
-
 $filtered_typedefs_fh = load_typedefs();
 
 check_indent();
@@ -430,6 +432,9 @@ check_indent();
 # any non-option arguments are files to be processed
 push(@files, @ARGV);
 
+# the exclude list applies to command line arguments as well as found files
+process_exclude();
+
 foreach my $source_filename (@files)
 {
 	# ignore anything that's not a .c or .h file


plpython vs _POSIX_C_SOURCE

2023-01-24 Thread Andres Freund
Hi,

A recent commit of mine [1] broke compilation of plpython on AIX [2]. But my
commit turns out to only be very tangentially related - it only causes a
failure because it references clock_gettime() in an inline function instead of
a macro and, as it turns out, plpython currently breaks references to
clock_gettime() and lots of other things.

>From [2]
> There's nice bit in plpython.h:
>
> /*
>  * Include order should be: postgres.h, other postgres headers, plpython.h,
>  * other plpython headers.  (In practice, other plpython headers will also
>  * include this file, so that they can compile standalone.)
>  */
> #ifndef POSTGRES_H
> #error postgres.h must be included before plpython.h
> #endif
>
> /*
>  * Undefine some things that get (re)defined in the Python headers. They 
> aren't
>  * used by the PL/Python code, and all PostgreSQL headers should be included
>  * earlier, so this should be pretty safe.
>  */
> #undef _POSIX_C_SOURCE
> #undef _XOPEN_SOURCE
>
>
> the relevant stuff in time.h is indeed guarded by
> #if _XOPEN_SOURCE>=500
>
>
> I don't think the plpython actually code follows the rule about including all
> postgres headers earlier.
>
> plpy_typeio.h:
>
> #include "access/htup.h"
> #include "fmgr.h"
> #include "plpython.h"
> #include "utils/typcache.h"
> [...]
>
> The include order aspect was perhaps feasible when there just was plpython.c,
> but with the split into many different C files and many headers, it seems hard
> to maintain. There's a lot of violations afaics.
> 
> The undefines were added in a11cf433413, the split in 147c2482542.


The background for the undefines is that _POSIX_C_SOURCE needs to be defined
the same for the whole compilation, not change in the middle, and Python.h
defines it. To protect "our" parts a11cf433413 instituted the rule that all
postgres headers have to be included first. But then that promptly got broken
in 147c2482542.

But apparently the breakage in 147c2482542 was partial enough that we didn't
run into obvious trouble so far (although I wonder if some of the linkage
issues we had in the past with plpython could be related).


I don't see a good solution here. I don't think the include order can
trivially be repaired, as long as plpy_*.h headers include postgres headers,
because there's no way to order two plpy_*.h includes in a .c file and have
all postgres headers come first.


The most minimal fix I can see is to institute the rule that no plpy_*.h
header can include a postgres header other than plpython.h.


A completely different approach would be to for our build to acquire the value
of _POSIX_C_SOURCE _XOPEN_SOURCE from Python.h and define them when compiling
plpython .c files. That has some dangers of incompatibilities with the rest of
the build though.  But it'd allow us to get rid of an obviously hard to
enforce rule.

Or we could see what breaks if we just don't care about _POSIX_C_SOURCE -
evidently we haven't succeeded in making this work for a long time.


Some other semi-related things:

An old comments:
/* Also hide away errcode, since we load Python.h before postgres.h */
#define errcode __msvc_errcode

but we don't do include Python.h before postgres.h...

We try to force linking to a non-debug python:
#if defined(_MSC_VER) && defined(_DEBUG)
/* Python uses #pragma to bring in a non-default libpython on VC++ if
 * _DEBUG is defined */
#undef _DEBUG

Which seems ill-advised? That's from d8f75d41315 in 2006.


python scribbling over our macros:
/*
 * Sometimes python carefully scribbles on our *printf macros.
 * So we undefine them here and redefine them after it's done its dirty deed.

I didn't find code in recent-ish python to do that. Perhaps we should try to
get away with not doing that?


Greetings,

Andres Freund

[1] https://postgr.es/m/e1pj6nt-004joh...@gemulon.postgresql.org
[2] https://postgr.es/m/20230121190303.7xjiwdg3gvb62...@awork3.anarazel.de




Re: cutting down the TODO list thread

2023-01-24 Thread Bruce Momjian
On Tue, Jan 24, 2023 at 10:46:34AM +0700, John Naylor wrote:
> 
> On Wed, Jan 18, 2023 at 3:13 AM Bruce Momjian  wrote:
> 
> > I think we risk overloading people with too many words above, and they
> > will not read it fully.  Can it be simplified?  I wonder if some of this
> > belows in the developer's FAQ and linked to that from the TODO list.
> 
> I think you're right. That further drives home what I mentioned a few days 
> ago:
> Maybe we don't need to change much in the actual list itself, but rephrase
> references to it from elsewhere. Here's a first draft to see what that would

The TODO list started as a way to record items that were either bugs or
features we couldn't yet implement.  At this point, it is more features
we are either not sure we want or don't know how to implement.  This
should be clearly specified wherever the TODO list is referenced.  I
think once this is done it should be clear how users and developers
should view that list.  I think the top of the TODO list makes that
clear:

This list contains some known PostgreSQL bugs, some feature
requests, and some things we are not even sure we want. Many of
these items are hard, and some are perhaps impossible. If you
would like to work on an item, please read the Developer FAQ
first. There is also a development information page.

The rest of the text appears clear to me too.

> look like:
> --
> https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F#TODOs
> 
> from:
> "PostgreSQL maintains a TODO list on our wiki: 
> http://wiki.postgresql.org/wiki/
> TODO
> 
> We have attempted to organize issues and link in relevant discussions from our
> mailing list archives. Please read background information if it is available
> before attempting to resolve a TODO. If no background information is 
> available,
> it is appropriate to post a question to pgsql-hack...@postgresql.org and
> request additional information and inquire about the status of any ongoing 
> work
> on the problem."
> 
> to:
> "It's worth checking if the feature of interest is found in the TODO list on
> our wiki: http://wiki.postgresql.org/wiki/TODO. That list contains some known
> PostgreSQL bugs, some feature requests, and some things we are not even sure 
> we
> want. Many entries have a link to an email thread containing prior discussion,
> or perhaps attempts that for whatever reason didn't make it as far as getting
> committed."
> 
> ...which might make more sense if moved below the "brand new features" 
> section.

I think we just point them at the TODO list and they will read the top
of the list first, no?  I think you are right that we updated the top of
the TODO but didn't update the places that link to it.  I am thinking we
should just trim down the text linking to it and let the top of the TODO
list do its job.

> --
> https://wiki.postgresql.org/wiki/Developer_FAQ
> 
> 1)
> from:
> "What areas need work?
> Outstanding features are detailed in Todo.
> 
> You can learn more about these features by consulting the archives, the SQL
> standards and the recommended texts (see books for developers)."
> 
> to:
> ??? -> For "what areas need work?", we need to have a different answer, but 
> I'm
> not sure what it is.

Wow, I would not send a new person to the SQL standard docs.  ;-)  I am
thinking we just don't have a good answer to this so let's say less.

> 2)
> from:
> "What do I do after choosing an item to work on?
> 
> Send an email to pgsql-hackers with a proposal for what you want to do
> (assuming your contribution is not trivial). Working in isolation is not
> advisable because others might be working on the same TODO item, or you might
> have misunderstood the TODO item. In the email, discuss both the internal
> implementation method you plan to use, and any user-visible changes (new
> syntax, etc)."
> 
> to:
> "What do I do after choosing an area to work on?
> 
> Send an email to pgsql-hackers with a proposal for what you want to do
> (assuming your contribution is not trivial). Working in isolation is not

Can new people identify trivial?

> advisable because experience has shown that there are often requirements that
> are not obvious, and if those are not agreed on beforehand it leads to wasted
> effort. In the email, discuss both the internal implementation method you plan
> to use, and any user-visible changes (new syntax, etc)."
> 
> > On Wed, Jan 11, 2023 at 02:09:56PM +0700, John Naylor wrote:
> > > I've come up with some revised language, including s/15/16/ and removing
> the
> > > category of "[E]" (easier to implement), since it wouldn't be here if it
> were
> > > actually easy:
> >
> > I think it is still possible for a simple item to be identified as
> > wanted and easy, but not completed and put on the TODO list.
> 
> Theoretically it's possible, but in practice no one puts any easy items here.
> Currently, there are three marked as easy:
> 
> pg_dump:
> - Dump security labels and comments on 

Re: Unicode grapheme clusters

2023-01-24 Thread Isaac Morland
On Tue, 24 Jan 2023 at 11:40, Greg Stark  wrote:

>
> At the end of the day Unicode kind of assumes a variable-width display
> where the rendering is handled by something that has access to the
> actual font metrics. So anything trying to line things up in columns
> in a way that works with any rendering system down the line using any
> font is going to be making a best guess.
>

Really what is needed is another Unicode attribute: how many columns of a
monospaced display each character (or grapheme cluster) should take up. The
standard should include a precisely defined function that can take any
sequence of characters and give back its width in monospaced display
character spaces. Typefaces should only qualify as monospaced if they
respect this standard-defined computation.

Note that this is not actually a new thing: this was included in ASCII
implicitly, with a value of 1 for every character, and a value of n for
every n-character string. It has always been possible to line up values
displayed on monospaced displays by adding spaces, and it is only the
omission of this feature from Unicode which currently makes it impossible.


Re: run pgindent on a regular basis / scripted manner

2023-01-24 Thread Tom Lane
Jelte Fennema  writes:
> Sounds like this conflict could be handled fairly easily by
> having a local git hook rerunning pgindent whenever
> you rebase a commit:
> 1. if you changed typedefs.list the hook would format all files
> 2. if you didn't it only formats the files that you changed

I think that would be undesirable, because then reindentation noise
in completely-unrelated files would get baked into feature commits,
complicating review and messing up "git blame" history.
The approach we currently have allows reindent effects to be
separated into ignorable labeled commits, which is a nice property.

> Merge failures are one issue. But personally the main benefit that
> I would be getting is being able to run pgindent on the files
> I'm editing and get this weird +12 columns formatting correct
> without having to manually type it. Without pgindent also
> changing random parts of the files that someone else touched
> a few commits before me.

Yeah, that always annoys me too, but I've always considered that
it's my problem not something I can externalize onto other people.
The real bottom line here is that AFAICT, there are fewer committers
who care about indent cleanliness than committers who do not, so
I do not think that the former group get to impose strict rules
on the latter, much as I might wish otherwise.

FWIW, Andrew's recent --show-diff feature for pgindent has
already improved my workflow for that.  I can do
"pgindent --show-diff >fixindent.patch", manually remove any hunks
in fixindent.patch that don't pertain to the code I'm working on,
and apply what remains to fix up my new code.  (I had been doing
something basically like this, but with more file-copying steps
to undo pgindent's edit-in-place behavior.)

regards, tom lane




Re: Unicode grapheme clusters

2023-01-24 Thread Greg Stark
On Sat, 21 Jan 2023 at 13:17, Tom Lane  wrote:
>
> Probably our long-term answer is to avoid depending on wcwidth
> and use wcswidth instead.  But it's hard to get excited about
> doing the legwork for that until popular libc implementations
> get it right.

Here's an interesting blog post about trying to do this in Rust:

https://tomdebruijn.com/posts/rust-string-length-width-calculations/

TL;DR... Even counting the number of graphemes isn't enough because
terminals typically (but not always) display emoji graphemes using two
columns.

At the end of the day Unicode kind of assumes a variable-width display
where the rendering is handled by something that has access to the
actual font metrics. So anything trying to line things up in columns
in a way that works with any rendering system down the line using any
font is going to be making a best guess.

-- 
greg




Re: run pgindent on a regular basis / scripted manner

2023-01-24 Thread Jelte Fennema
> As a concrete example, suppose Alice commits some code that uses "foo"
> as a variable name, and more or less concurrently, Bob commits something
> that defines "foo" as a typedef.  Bob's change is likely to have
> side-effects on the formatting of Alice's code.  If they're working in
> well-separated parts of the source tree, nobody is likely to notice
> that for awhile --- but whoever next touches the files Alice touched
> will be in for a surprise, which will be more or less painful depending
> on whether we've installed brittle processes.

Sounds like this conflict could be handled fairly easily by
having a local git hook rerunning pgindent whenever
you rebase a commit:
1. if you changed typedefs.list the hook would format all files
2. if you didn't it only formats the files that you changed

> As another example, the mechanisms we use to create the typedefs list
> in the first place are pretty squishy/leaky: they depend on which
> buildfarm animals are running the typedef-generation step, and on
> whether anything's broken lately in that code --- which happens on
> a fairly regular basis (eg [1]).  Maybe that could be improved,
> but I don't see an easy way to capture the set of system-defined
> typedefs that are in use on platforms other than your own.  I
> definitely do not want to go over to hand maintenance of that list.

Wouldn't the automatic addition-only solution that Andres suggested
solve this issue? Build farms could still remove unused typedefs
on a regular basis, but commits would at least add typedefs for the
platform that the comitter uses.

> I think we need to be content with a "soft", it's more-or-less-right
> approach to indentation.

I think that this would already be a significant improvement over the
current situation. My experience with the current situation is that
indentation is more-or-less-wrong.

> As I explained to somebody upthread, the
> main benefit of this for most people is avoiding the need for a massive
> once-a-year reindent run that causes merge failures for many pending
> patches.

Merge failures are one issue. But personally the main benefit that
I would be getting is being able to run pgindent on the files
I'm editing and get this weird +12 columns formatting correct
without having to manually type it. Without pgindent also
changing random parts of the files that someone else touched
a few commits before me.




Re: CREATEROLE users vs. role properties

2023-01-24 Thread Robert Haas
On Tue, Jan 24, 2023 at 9:07 AM tushar  wrote:
> right, Neha/I have tested with different scenarios using 
> createdb/replication/bypassrls and other
> privileges properties on the role. also checked pg_dumpall/pg_basebackup and 
> everything looks fine.

Thanks. I have committed the patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: old_snapshot_threshold bottleneck on replica

2023-01-24 Thread Robert Haas
On Mon, Jan 23, 2023 at 9:40 AM Maxim Orlov  wrote:
> One of our customers stumble onto a significant performance degradation while 
> running multiple OLAP-like queries on a replica.
> After some investigation, it became clear that the problem is in accessing 
> old_snapshot_threshold parameter.

It has been suggested that we remove that feature entirely.

> My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 
> 0001 patch.

This patch changes threshold_timestamp and threshold_xid to use
atomics, but it does not remove mutex_threshold which, according to a
quick glance at the comments, protects exactly those two fields. So,
either:

(1) that mutex also protects something else and the existing comment
is wrong, or

(2) the mutex should have been removed but the patch neglected to do so, or

(3) the mutex is still needed for some reason, in which case either
(3a) the patch isn't actually safe or (3b) the patch needs comments to
explain what the new synchronization model is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pgindent vs variable declaration across multiple lines

2023-01-24 Thread Tom Lane
Andrew Dunstan  writes:
> On 2023-01-22 Su 17:34, Tom Lane wrote:
>> I've also attached a diff
>> representing the delta between what current pg_bsd_indent wants to do
>> to HEAD and what this would do.  All the changes it wants to make look
>> good, although I can't say whether there are other places it's failing
>> to change that we'd like it to.

> Changes look good. There are a handful of places where I think the code
> would be slightly more readable if a leading typecast were moved to the
> second line.

Possibly, but that's the sort of decision that pgindent leaves to human
judgment I think.  It'll reflow comment blocks across lines, but I don't
recall having seen it move line breaks within code.

regards, tom lane




Re: Making Vars outer-join aware

2023-01-24 Thread Tom Lane
Hans Buschmann  writes:
> I just noticed your new efforts in this area.
> I wanted to recurr to my old thread [1] considering constant propagation of 
> quals.
> [1]  https://www.postgresql.org/message-id/1571413123735.26...@nidsa.net

Yeah, this patch series is not yet quite up to the point of improving
that.  That area is indeed the very next thing I want to work on, and
I did spend some effort on it last month, but I ran out of time to get
it working.  Maybe we'll have something there for v17.

regards, tom lane




Re: pgindent vs variable declaration across multiple lines

2023-01-24 Thread Andrew Dunstan


On 2023-01-22 Su 17:34, Tom Lane wrote:
> I've also attached a diff
> representing the delta between what current pg_bsd_indent wants to do
> to HEAD and what this would do.  All the changes it wants to make look
> good, although I can't say whether there are other places it's failing
> to change that we'd like it to.
>
>   


Changes look good. There are a handful of places where I think the code
would be slightly more readable if a leading typecast were moved to the
second line.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Improve GetConfigOptionValues function

2023-01-24 Thread Tom Lane
Bharath Rupireddy  writes:
> On Mon, Jan 23, 2023 at 9:51 PM Tom Lane  wrote:
>> Also, I intentionally dropped the GUC_NO_SHOW_ALL check in
>> get_explain_guc_options, because it seems redundant given
>> the preceding GUC_EXPLAIN check.  It's unlikely we'd ever have
>> a variable that's marked both GUC_EXPLAIN and GUC_NO_SHOW_ALL ...
>> but if we did, shouldn't the former take precedence here anyway?

> You're right, but there's nothing that prevents users writing GUCs
> with GUC_EXPLAIN and GUC_NO_SHOW_ALL.

"Users"?  You do realize those flags are only settable by C code,
right?  Moreover, you haven't explained why it would be good that
you can't get at the behavior that a GUC is both shown in EXPLAIN
and not shown in SHOW ALL.  If you want "not shown by either",
that's already accessible by setting only the GUC_NO_SHOW_ALL
flag.  So I'd almost argue this is a bug fix, though I concede
it's a bit hard to imagine why somebody would want that choice.
Still, if we have two independent flags they should produce four
behaviors, not just three.

regards, tom lane




Re: to_hex() for negative inputs

2023-01-24 Thread Dean Rasheed
On Tue, 24 Jan 2023 at 13:43, Aleksander Alekseev
 wrote:
>
> Adding extra arguments for something the user can implement
> (him/her)self doesn't seem to be a great idea. With this approach we
> may end up with hundreds of arguments one day.
>

I don't see how a couple of extra arguments will expand to hundreds.

> =# select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
> from ( values (123), (-123) ) as s(x);
>

Which is broken for INT_MIN:

select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
from ( values (-2147483648) ) as s(x);

ERROR:  integer out of range

Part of the reason for implementing this in core would be to save
users from such easy-to-overlook bugs.

Regards,
Dean




Re: run pgindent on a regular basis / scripted manner

2023-01-24 Thread Vladimir Sitnikov
Andres> Of course, but I somehow feel a change of formatting should be
reviewable to
Andres> at least some degree

One way of reviewing the formatting changes is to compare the compiled
binaries.

If the binaries before and after formatting are the same, then there's a
high chance the behaviour is the same.

Vladimir


Re: Minimal logical decoding on standbys

2023-01-24 Thread Drouvot, Bertrand

Hi,

On 1/24/23 12:21 AM, Melanie Plageman wrote:

I'm new to this thread and subject, but I had a few basic thoughts about
the first patch in the set.



Thanks for looking at it!


On Mon, Jan 23, 2023 at 12:03:35PM +0100, Drouvot, Bertrand wrote:


Please find V42 attached.

 From 3c206bd77831d507f4f95e1942eb26855524571a Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Mon, 23 Jan 2023 10:07:51 +
Subject: [PATCH v42 1/6] Add info in WAL records in preparation for logical
  slot conflict handling.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Overall design:

1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing replication conflicts much as hot standby does.


It is a little confusing to mention replication conflicts in point 1. It
makes it sound like it already logs a recovery conflict. Without the
recovery conflict handling in this patchset, logical decoding of
statements using data that has been removed will fail with some error
like :
ERROR:  could not map filenumber "xxx" to relation OID
Part of what this patchset does is introduce the concept of a new kind
of recovery conflict and a resolution process.
  


I think I understand what you mean, what about the following?

1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing error(s) on the standby.

To prevent those errors a new replication conflict scenario
needs to be addressed (as much as hot standby does.)


2. Our chosen strategy for dealing with this type of replication slot
is to invalidate logical slots for which needed data has been removed.

3. To do this we need the latestRemovedXid for each change, just as we
do for physical replication conflicts, but we also need to know
whether any particular change was to data that logical replication
might access.



It isn't clear from the above sentence why you would need both. I think
it has something to do with what is below (hot_standby_feedback being
off), but I'm not sure, so the order is confusing.
  


Right, it has to deal with the xid horizons too. So the idea is to check if
1) there is a risk of conflict and 2) if there is a risk then check
is there a conflict? (based on the xid). I'll reword this part.


4. We can't rely on the standby's relcache entries for this purpose in
any way, because the startup process can't access catalog contents.

5. Therefore every WAL record that potentially removes data from the
index or heap must carry a flag indicating whether or not it is one
that might be accessed during logical decoding.

Why do we need this for logical decoding on standby?

First, let's forget about logical decoding on standby and recall that
on a primary database, any catalog rows that may be needed by a logical
decoding replication slot are not removed.

This is done thanks to the catalog_xmin associated with the logical
replication slot.

But, with logical decoding on standby, in the following cases:

- hot_standby_feedback is off
- hot_standby_feedback is on but there is no a physical slot between
   the primary and the standby. Then, hot_standby_feedback will work,
   but only while the connection is alive (for example a node restart
   would break it)

Then, the primary may delete system catalog rows that could be needed
by the logical decoding on the standby (as it does not know about the
catalog_xmin on the standby).

So, it’s mandatory to identify those rows and invalidate the slots
that may need them if any. Identifying those rows is the purpose of
this commit.


I would like a few more specifics about this commit (patch 1 in the set)
itself in the commit message.

I think it would be good to have the commit message mention what kinds
of operations require WAL to contain information about whether or not it
is operating on a catalog table and why this is.

For example, I think the explanation of the feature makes it clear why
vacuum and pruning operations would require isCatalogRel, however it
isn't immediately obvious why page reuse would.



What do you think about putting those extra explanations in the code instead?


Also, because the diff has so many function signatures changed, it is
hard to tell which xlog record types are actually being changed to
include isCatalogRel. It might be too detailed/repetitive for the final
commit message to have a list of the xlog types requiring this info
(gistxlogPageReuse, spgxlogVacuumRedirect, xl_hash_vacuum_one_page,
xl_btree_reuse_page, xl_btree_delete, xl_heap_visible, xl_heap_prune,
xl_heap_freeze_page) but perhaps you could enumerate all the general
operations (freeze page, vacuum, prune, etc).



Right, at the end there is only a few making "real" use of it: they can be
enumerated in the commit message. Will do.


Implementation:

When a WAL replay on standby indicates that a catalog 

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread Takamichi Osumi (Fujitsu)
On Tuesday, January 24, 2023 8:32 AM Euler Taveira  wrote:
> Good to know that you keep improving this patch. I have a few suggestions that
> were easier to provide a patch on top of your latest patch than to provide an
> inline suggestions.
Thanks for your review ! We basically adopted your suggestions.


> There are a few documentation polishing. Let me comment some of them above.
> 
> -   The length of time (ms) to delay the application of changes.
> +   Total time spent delaying the application of changes, in milliseconds
> 
> I don't remember if I suggested this description for catalog but IMO the
> suggestion reads better for me.
Adopted the above change.


> -   For time-delayed logical replication (i.e. when the subscription is
> -   created with parameter min_apply_delay > 0), the apply worker sends a
> -   Standby Status Update message to the publisher with a period of
> -   wal_receiver_status_interval. Make sure to set
> -   wal_receiver_status_interval less than the
> -   wal_sender_timeout on the publisher, otherwise, the
> -   walsender will repeatedly terminate due to the timeout errors. If
> -   wal_receiver_status_interval is set to zero, the 
> apply
> -   worker doesn't send any feedback messages during the subscriber's
> -   min_apply_delay period. See
> -for details.
> +   For time-delayed logical replication, the apply worker sends a 
> feedback
> +   message to the publisher every
> +   wal_receiver_status_interval milliseconds. Make 
> sure
> +   to set wal_receiver_status_interval less than the
> +   wal_sender_timeout on the publisher, otherwise, the
> +   walsender will repeatedly terminate due to timeout
> +   error. If wal_receiver_status_interval is set to
> +   zero, the apply worker doesn't send any feedback messages during the
> +   min_apply_delay interval.
> 
> I removed the parenthesis explanation about time-delayed logical replication.
> If you are reading the documentation and does not know what it means you 
> should
> (a) read the logical replication chapter or (b) check the glossary (maybe a 
> new
> entry should be added). I also removed the Standby status Update message but 
> it
> is a low level detail; let's refer to it as feedback message as the other
> sentences do. I changed "literal" to "varname" that's the correct tag for
> parameters. I replace "period" with "interval" that was the previous
> terminology. IMO we should be uniform, use one or the other.
Adopted.

Also, I added the glossary for time-delayed replication (one for
applicable to both physical replication and logical replication).
Plus, I united the term "interval" to period, because it would clarify the type 
for this feature.
I think this is better.
> -   The subscriber replication can be instructed to lag behind the publisher
> -   side changes by specifying the min_apply_delay
> -   subscription parameter. See  for
> -   details.
> +   A logical replication subscription can delay the application of changes by
> +   specifying the min_apply_delay subscription parameter.
> +   See  for details.
> 
> This feature refers to a specific subscription, hence, "logical replication
> subscription" instead of "subscriber replication".
Adopted.

> +   if (IsSet(opts->specified_opts, SUBOPT_MIN_APPLY_DELAY))
> +   errorConflictingDefElem(defel, pstate);
> +
> 
> Peter S referred to this missing piece of code too.
Added.


> -int
> +static int
> defGetMinApplyDelay(DefElem *def)
> {
> 
> It seems you forgot static keyword.
Fixed.


> -   elog(DEBUG2, "time-delayed replication for txid %u, min_apply_delay = 
> %lld ms, Remaining wait time: %ld ms",
> -xid, (long long) MySubscription->minapplydelay, diffms);
> +   elog(DEBUG2, "time-delayed replication for txid %u, min_apply_delay = 
> " INT64_FORMAT " ms, remaining wait time: %ld ms",
> +xid, MySubscription->minapplydelay, diffms);
> int64 should use format modifier INT64_FORMAT.
Fixed.


> - (long) wal_receiver_status_interval * 1000,
> + wal_receiver_status_interval * 1000L,
> 
> Cast is not required. I added a suffix to the constant.
Fixed.


> -   elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, 
> flush %X/%X in-delayed: %d",
> +   elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, 
> flush %X/%X, apply delay: %s",
>  force,
>  LSN_FORMAT_ARGS(recvpos),
>  LSN_FORMAT_ARGS(writepos),
>  LSN_FORMAT_ARGS(flushpos),
> -in_delayed_apply);
> +in_delayed_apply? "yes" : "no");
> 
> It is better to use a string to represent the yes/no option.
Fixed.


> - gettext_noop("Min apply delay (ms)"));
> + gettext_noop("Min apply delay"));
> 
> I don't know if it was discussed but we don't add units to headers. When I
> think about this parameter 

Re: run pgindent on a regular basis / scripted manner

2023-01-24 Thread Tom Lane
Jelte Fennema  writes:
>> One issue on requiring patches to have run pgindent previously is
>> actually the typedef list.  If someone adds a typedef in a commit, they
>> will see different pgident output in the committed files, and perhaps
>> others, and the new typedefs might only appear after the commit, causing
>> later commits to not match.

> I'm not sure I understand the issue you're pointing out. If someone
> changes the typedef list, imho they want the formatting to change
> because of that. So requiring an addition to the typedef list to also
> commit reindentation to all files that this typedef indirectly impacts
> seems pretty reasonable to me.

I think the issue Bruce is pointing out is that this is another mechanism
whereby different people could get different indentation results.
I fear any policy that is based on an assumption that indentation has
One True Result is going to fail.

As a concrete example, suppose Alice commits some code that uses "foo"
as a variable name, and more or less concurrently, Bob commits something
that defines "foo" as a typedef.  Bob's change is likely to have
side-effects on the formatting of Alice's code.  If they're working in
well-separated parts of the source tree, nobody is likely to notice
that for awhile --- but whoever next touches the files Alice touched
will be in for a surprise, which will be more or less painful depending
on whether we've installed brittle processes.

As another example, the mechanisms we use to create the typedefs list
in the first place are pretty squishy/leaky: they depend on which
buildfarm animals are running the typedef-generation step, and on
whether anything's broken lately in that code --- which happens on
a fairly regular basis (eg [1]).  Maybe that could be improved,
but I don't see an easy way to capture the set of system-defined
typedefs that are in use on platforms other than your own.  I
definitely do not want to go over to hand maintenance of that list.

I think we need to be content with a "soft", it's more-or-less-right
approach to indentation.  As I explained to somebody upthread, the
main benefit of this for most people is avoiding the need for a massive
once-a-year reindent run that causes merge failures for many pending
patches.  But we don't need to completely eliminate such runs to get
99.9% of that benefit; we only need to reduce the number of places
they're likely to touch.

regards, tom lane

[1] 
https://github.com/PGBuildFarm/client-code/commit/dcca861554e90d6395c3c153317b0b0e3841f103




Re: Improve GetConfigOptionValues function

2023-01-24 Thread Bharath Rupireddy
On Mon, Jan 23, 2023 at 9:51 PM Tom Lane  wrote:
>
> Bharath Rupireddy  writes:
> > LGTM. I've marked it RfC.
>
> After looking at this, it seemed to me that the factorization
> wasn't quite right after all: specifically, the new function
> could be used in several more places if it confines itself to
> being a privilege check and doesn't consider GUC_NO_SHOW_ALL.
> So more like the attached.

Thanks. It looks even cleaner now.

> Also, I intentionally dropped the GUC_NO_SHOW_ALL check in
> get_explain_guc_options, because it seems redundant given
> the preceding GUC_EXPLAIN check.  It's unlikely we'd ever have
> a variable that's marked both GUC_EXPLAIN and GUC_NO_SHOW_ALL ...
> but if we did, shouldn't the former take precedence here anyway?

You're right, but there's nothing that prevents users writing GUCs
with GUC_EXPLAIN and GUC_NO_SHOW_ALL. FWIW, I prefer retaining the
behaviour as-is i.e. we can have explicit if (conf->flags &
GUC_NO_SHOW_ALL) continue; there in get_explain_guc_options().

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Exclusion constraints on partitioned tables

2023-01-24 Thread Ronan Dunklau
Le vendredi 16 décembre 2022, 06:11:49 CET Paul Jungwirth a écrit :
> On 12/15/22 16:12, Tom Lane wrote:
> >> This patch also requires the matching constraint columns to use equality
> >> comparisons (`(foo WITH =)`), so it is really equivalent to the existing
> >> b-tree rule.
> > 
> > That's not quite good enough: you'd better enforce that it's the same
> > equality operator (and same collation, if relevant) as is being used
> > in the partition key.
> > [snip]
> > It might work better to consider the operator itself and ask if
> > it's equality in the same btree opfamily that's used by the
> > partition key.
> 
> Thank you for taking a look! Here is a comparison on just the operator
> itself.
> 

I've taken a look at the patch, and I'm not sure why you keep the restriction 
on the Gist operator being of the RTEqualStrategyNumber strategy. I don't 
think  we have any other place where we expect those strategy numbers to 
match. For hash it's different, as the hash-equality is the only operator 
strategy and as such there is no other way to look at it. Can't we just 
enforce partition_operator == exclusion_operator without adding the 
RTEqualStrategyNumber for the opfamily into the mix ?









  1   2   >