date:20230119

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-19 Thread Masahiko Sawada

On Thu, Jan 12, 2023 at 11:02 PM Imseih (AWS), Sami  wrote:
>
> Thanks for the feedback and I apologize for the delay in response.
>
> >I think the problem here is that you're basically trying to work around 
> > the
> >lack of an asynchronous state update mechanism between leader and 
> > workers. The
> >workaround is to add a lot of different places that poll whether there 
> > has
> >been any progress. And you're not doing that by integrating with the 
> > existing
> >machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by
> >developing a new mechanism.
>
> >I think your best bet would be to integrate with 
> > HandleParallelMessages().
>
> You are correct. I have been trying to work around the async nature
> of parallel workers performing the index vacuum. As you have pointed out,
> integrating with HandleParallelMessages does appear to be the proper way.
> Doing so will also avoid having to do any progress updates in the index AMs.

Very interesting idea. I need to study the parallel query
infrastructure more to consider potential downside of this idea but it
seems okay as far as I researched so far.

> In the attached patch, the parallel workers send a new type of protocol 
> message
> type to the leader called 'P' which signals the leader that it should handle a
> progress update. The leader then performs the progress update by
> invoking a callback set in the ParallelContext. This is done inside  
> HandleParallelMessages.
> In the index vacuum case, the callback is parallel_vacuum_update_progress.
>
> The new message does not contain a payload, and it's merely used to
> signal the leader that it can invoke a progress update.

Thank you for updating the patch. Here are some comments for v22 patch:

---
+  
+   Number of indexes that will be vacuumed or cleaned up. This
value will be
+   0 if the phase is not vacuuming
indexes
+   or cleaning up indexes,
INDEX_CLEANUP
+   is set to OFF, index vacuum is skipped due to very
+   few dead tuples in the table, or vacuum failsafe is triggered.

I think that if INDEX_CLEANUP is set to OFF or index vacuum is skipped
due to failsafe mode, we enter neither vacuum indexes phase nor
cleanup indexes phase. So probably we can say something like:

Number of indexes that will be vacuumed or cleaned up. This counter only
advances when the phase is vacuuming indexes or cleaning up indexes.

---
-/* Report that we are now vacuuming indexes */
-pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
-
PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
+/*
+ * Report that we are now vacuuming indexes
+ * and the number of indexes to vacuum.
+ */
+progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX;
+progress_start_val[1] = vacrel->nindexes;
+pgstat_progress_update_multi_param(2, progress_start_index,
progress_start_val);

According to our code style guideline[1], we limit line lengths so
that the code is readable in an 80-column window. Some comments
updated in this patch seem too short.

---
+StringInfoData msgbuf;
+
+pq_beginmessage(&msgbuf, 'P');
+pq_endmessage(&msgbuf);

I think we can use pq_putmessage() instead.

---
+/* progress callback definition */
+typedef void (*ParallelProgressCallback) (void
*parallel_progress_callback_state);

I think it's better to define "void *arg".

---
+/*
+ * A Leader process that receives this message
+ * must be ready to update progress.
+ */
+Assert(pcxt->parallel_progress_callback);
+Assert(pcxt->parallel_progress_callback_arg);
+
+/* Report progress */
+
pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg);

I think the parallel query infra should not require
parallel_progress_callback_arg to always be set. I think it can be
NULL.

---
+void
+parallel_vacuum_update_progress(void *arg)
+{
+ParallelVacuumState *pvs = (ParallelVacuumState *)arg;
+
+Assert(!IsParallelWorker());
+
+if (pvs)
+pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED,
+
   pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1));
+}

Since parallel vacuum always sets the arg, I think we don't need to check it.

Regards,

[1] https://www.postgresql.org/docs/devel/source-format.html

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-19 Thread Peter Smith

On Fri, Jan 20, 2023 at 2:47 PM shveta malik  wrote:
>
...
> 2)
>  Logging:
> 2023-01-19 17:33:16.202 IST [404797] DEBUG:  logical replication apply
> delay: 19979 ms
> 2023-01-19 17:33:26.212 IST [404797] DEBUG:  logical replication apply
> delay: 9969 ms
> 2023-01-19 17:34:25.730 IST [404962] DEBUG:  logical replication apply
> delay: 179988 ms-->previous wait over, started for next txn
> 2023-01-19 17:34:35.737 IST [404962] DEBUG:  logical replication apply
> delay: 169981 ms
> 2023-01-19 17:34:45.746 IST [404962] DEBUG:  logical replication apply
> delay: 159972 ms
>
> Is there a way to distinguish between these logs? Maybe dumping xids 
> along-with?
>

+1

Also, I was thinking of some other logging enhancements

a) the message should say that this is the *remaining* time to left to wait.

b) it might be convenient to know from the log what was the original
min_apply_delay value in the 1st place.

For example, the logs might look something like this:

DEBUG: time-delayed replication for txid 1234, min_apply_delay =
16 ms. Remaining wait time: 159972 ms
DEBUG: time-delayed replication for txid 1234, min_apply_delay =
16 ms. Remaining wait time: 142828 ms
DEBUG: time-delayed replication for txid 1234, min_apply_delay =
16 ms. Remaining wait time: 129994 ms
DEBUG: time-delayed replication for txid 1234, min_apply_delay =
16 ms. Remaining wait time: 110001 ms
...

--
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Fri, Jan 20, 2023 at 10:10 AM Peter Smith  wrote:
> Here are some review comments for patch v3-0001.

Thanks for your comments.

> ==
> Commit message
> 
> 1.
> The problem is when there is a DDL in a transaction that generates lots of
> temporary data due to rewrite rules, these temporary data will not be
> processed
> by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no
> impact on this case.
> 
> ~
> 
> 1a.
> IMO this comment needs to give a bit of background about the original
> problem here, rather than just starting with "The problem is" which is
> describing the flaws of the previous fix.

Added some related message.

> ~
> 
> 1b.
> "pgoutput - plugin" -> "pgoutput plugin" ??

Changed.

> ~~~
> 
> 2.
> 
> To fix this, we introduced a new ReorderBuffer callback -
> 'ReorderBufferUpdateProgressCB'. This callback is called to try to update the
> process after each change has been processed during sending data of a
> transaction (and its subtransactions) to the output plugin.
> 
> IIUC it's not really "after each change" - shouldn't this comment
> mention something about the CHANGES_THRESHOLD 100?

Changed.

> ~~~
> 
> 4. update_progress_cb_wrapper
> 
> +/*
> + * Update progress callback
> + *
> + * Try to update progress and send a keepalive message if too many changes
> were
> + * processed when processing txn.
> + *
> + * For a large transaction, if we don't send any change to the downstream 
> for a
> + * long time (exceeds the wal_receiver_timeout of standby) then it can
> timeout.
> + * This can happen when all or most of the changes are either not published 
> or
> + * got filtered out.
> + */
> 
> SUGGESTION (instead of the "Try to update" sentence)
> Send a keepalive message whenever more than 
> changes are encountered while processing a transaction.

Since it's possible that keep-alive messages won't be sent even if the
threshold is reached (see function WalSndKeepaliveIfNecessary), I thought it
might be better to use "try to".
And rewrote the comments here because the threshold logic is moved to the
function ReorderBufferProcessTXN.

> ~~~
> 
> 5.
> 
> +static void
> +update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
> +ReorderBufferChange *change)
> +{
> + LogicalDecodingContext *ctx = cache->private_data;
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> + static int changes_count = 0; /* Static variable used to accumulate
> + * the number of changes while
> + * processing txn. */
> +
> 
> IMO this may be more readable if the static 'changes_count' local var
> was declared first and separated from the other vars by a blank line.

Changed.

> ~~~
> 
> 6.
> 
> + /*
> + * We don't want to try sending a keepalive message after processing each
> + * change as that can have overhead. Tests revealed that there is no
> + * noticeable overhead in doing it after continuously processing 100 or so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
> 
> 6a.
> I think it might be better to define this right at the top of the
> function adjacent to the 'changes_count' variable (e.g. a bit like the
> original HEAD code looked)

Changed.

> ~
> 
> 6b.
> SUGGESTION (for the comment)
> Sending keepalive messages after every change has some overhead, but
> testing showed there is no noticeable overhead if keepalive is only
> sent after every ~100 changes.

Changed.

> ~~~
> 
> 7.
> 
> +
> + /*
> + * After continuously processing CHANGES_THRESHOLD changes, we
> + * try to send a keepalive message if required.
> + */
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> + changes_count = 0;
> + }
> +
> 
> 7a.
> SUGGESTION (for comment)
> Send a keepalive message after every CHANGES_THRESHOLD changes.

Changed.

Regards,
Wang Wei

Re: generic plans and "initial" pruning

2023-01-19 Thread Amit Langote

On Fri, Jan 20, 2023 at 12:58 PM Tom Lane  wrote:
> Amit Langote  writes:
> > On Fri, Jan 20, 2023 at 12:31 PM Tom Lane  wrote:
> >> It might be possible to incorporate this pointer into PlannedStmt
> >> instead of passing it separately.
>
> > Yeah, that would be less churn.  Though, I wonder if you still hold
> > that PlannedStmt should not be scribbled upon outside the planner as
> > you said upthread [1]?
>
> Well, the whole point of that rule is that the executor can't modify
> a plancache entry.  If the plancache itself sets a field in such an
> entry, that doesn't seem problematic from here.
>
> But there's other possibilities if that bothers you; QueryDesc
> could hold the field, for example.  Also, I bet we'd want to copy
> it into EState for the main initialization recursion.

QueryDesc sounds good to me, and yes, also a copy in EState in any case.

So I started looking at the call sites of CreateQueryDesc() and
stopped to look at ExecParallelGetQueryDesc().  AFAICS, we wouldn't
need to pass the CachedPlan to a parallel worker's rerun of
InitPlan(), because 1) it doesn't make sense to call the plancache in
a parallel worker, 2) the leader should already have taken all the
locks in necessary for executing a given plan subnode that it intends
to pass to a worker in ExecInitGather().  Does that make sense?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Fri, Jan 20, 2023 at 12:35 PM Amit Kapila  wrote:
> On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
> >
> > Here are some review comments for patch v3-0001.
> >
> > ==
> > src/backend/replication/logical/logical.c
> >
> > 3. forward declaration
> >
> > +/* update progress callback */
> > +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> > +ReorderBufferTXN *txn,
> > +ReorderBufferChange *change);
> >
> > I felt this function wrapper name was a bit misleading... AFAIK every
> > other wrapper really does just wrap their respective functions. But
> > this one seems a bit different because it calls the wrapped function
> > ONLY if some threshold is exceeded. IMO maybe this function could have
> > some name that conveys this better:
> >
> > e.g. update_progress_cb_wrapper_with_threshold
> >
> 
> I am wondering whether it would be better to move the threshold logic
> to the caller. Previously this logic was inside the function because
> it was being invoked from multiple places but now that won't be the
> case. Also, then your concern about the name would also be addressed.

Agree. Moved the threshold logic to the function ReorderBufferProcessTXN.

> >
> > ~
> >
> > 7b.
> > Would it be neater to just call OutputPluginUpdateProgress here instead?
> >
> > e.g.
> > BEFORE
> > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> > AFTER
> > OutputPluginUpdateProgress(ctx, false);
> >
> 
> We already check whether ctx->update_progress is defined or not which
> is the only extra job done by OutputPluginUpdateProgress but probably
> we can consolidate the checks and directly invoke
> OutputPluginUpdateProgress.

Changed. Invoke the function OutputPluginUpdateProgress directly in the new
callback.

Regards,
Wang Wei

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Thu, Jan 19, 2023 at 19:37 PM Amit Kapila  wrote:
> On Thu, Jan 19, 2023 at 4:13 PM Ashutosh Bapat
>  wrote:
> >
> > On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila 
> wrote:
> >
> > > + */
> > > + ReorderBufferUpdateProgressCB update_progress;
> > >
> > > Are you suggesting changing the name of the above variable? If so, how
> > > about apply_progress, progress, or updateprogress? If you don't like
> > > any of these then feel free to suggest something else. If we change
> > > the variable name then accordingly, we need to update
> > > ReorderBufferUpdateProgressCB as well.
> > >
> >
> > I would liked to have all the callback names renamed with prefix
> > "rbcb_xxx" so that they have very less chances of conflicting with
> > similar names in the code base. But it's probably late to do that :).
> >
> > How are update_txn_progress since the CB is supposed to be used only
> > within a transaction? or update_progress_txn?
> >
> 
> Personally, I would prefer 'apply_progress' as it would be similar to
> a few other callbacks like apply_change, apply_truncate, or as is
> proposed by patch update_progress again because it is similar to
> existing callbacks like commit_prepared. If you and others don't like
> any of those then we can go for 'update_progress_txn' as well. Anybody
> else has an opinion on this?

I think 'update_progress_txn' might be better. Because I think this name seems
to make it easier to know that this callback is used to update process when
processing txn. So, I rename it to 'update_progress_txn'.

I have addressed all the comments and here is the new version patch.

Regards,
Wang Wei


v4-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v4-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-19 Thread Peter Smith

Hi Osumi-san, here are my review comments for the latest patch v17-0001.

==
Commit Message

1.
Prohibit the combination of this feature and parallel streaming mode.

SUGGESTION (using the same wording as in the code comments)
The combination of parallel streaming mode and min_apply_delay is not allowed.

==
doc/src/sgml/ref/create_subscription.sgml

2.
+ 
+  By default, the subscriber applies changes as soon as possible. This
+  parameter allows the user to delay the application of changes by a
+  specified amount of time. If the value is specified without units, it
+  is taken as milliseconds. The default is zero (no delay).
+ 

Looking at this again, it seemed a  bit strange to repeat "specified"
twice in 2 sentences. Maybe change one of them.

I’ve also suggested using the word "interval" because I don’t think
docs yet mentioned anywhere (except in the example) that using
intervals is possible.

SUGGESTION (for the 2nd sentence)
This parameter allows the user to delay the application of changes by
a given time interval.

~~~

3.
+ 
+  Any delay occurs only on WAL records for transaction begins after all
+  initial table synchronization has finished. The delay is calculated
+  between the WAL timestamp as written on the publisher and the current
+  time on the subscriber. Any overhead of time spent in
logical decoding
+  and in transferring the transaction may reduce the actual wait time.
+  It is also possible that the overhead already execeeds the requested
+  min_apply_delay value, in which case no additional
+  wait is necessary. If the system clocks on publisher and subscriber
+  are not synchronized, this may lead to apply changes earlier than
+  expected, but this is not a major issue because this parameter is
+  typically much larger than the time deviations between servers. Note
+  that if this parameter is set to a long delay, the replication will
+  stop if the replication slot falls behind the current LSN
by more than
+  max_slot_wal_keep_size.
+ 

3a.
Typo "execeeds" (I think Vignesh reported this already)

~

3b.
SUGGESTION (for the 2nd sentence)
BEFORE
The delay is calculated between the WAL timestamp...
AFTER
The delay is calculated as the difference between the WAL timestamp...

~~~

4.
+ 
+   
+Delaying the replication can mean there is a much longer
time between making
+a change on the publisher, and that change being
committed on the subscriber.
+v
+See .
+   
+ 

IMO maybe there is a better way to express the 2nd sentence:

BEFORE
This can have a big impact on synchronous replication.
AFTER
This can impact the performance of synchronous replication.

==
src/backend/commands/subscriptioncmds.c

5. parse_subscription_options

@@ -324,6 +328,43 @@ parse_subscription_options(ParseState *pstate,
List *stmt_options,
  opts->specified_opts |= SUBOPT_LSN;
  opts->lsn = lsn;
  }
+ else if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
+ strcmp(defel->defname, "min_apply_delay") == 0)
+ {
+ char*val,
+*tmp;
+ Interval   *interval;
+ int64 ms;

IMO 'delay_ms' (or similar) would be a friendlier variable name than just 'ms'

~~~

6.
@@ -404,6 +445,20 @@ parse_subscription_options(ParseState *pstate,
List *stmt_options,
  "slot_name = NONE", "create_slot = false")));
  }
  }
+
+ /*
+ * The combination of parallel streaming mode and min_apply_delay is not
+ * allowed.
+ */
+ if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
+ opts->min_apply_delay > 0)
+ {
+ if (opts->streaming == LOGICALREP_STREAM_PARALLEL)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("%s and %s are mutually exclusive options",
+"min_apply_delay > 0", "streaming = parallel"));
+ }

This could be expressed as a single condition using &&, maybe also
with the brackets eliminated. (Unless you feel the current code is
more readable)

~~~

7.

+ if (opts.min_apply_delay > 0)
+ if ((IsSet(opts.specified_opts, SUBOPT_STREAMING) && opts.streaming
== LOGICALREP_STREAM_PARALLEL) ||
+ (!IsSet(opts.specified_opts, SUBOPT_STREAMING) && sub->stream ==
LOGICALREP_STREAM_PARALLEL))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot enable %s for subscription in %s mode",
+"min_apply_delay", "streaming = parallel"));

These nested ifs could instead be a single "if" with && condition.
(Unless you feel the current code is more readable)


==
src/backend/replication/logical/worker.c

8. maybe_delay_apply

+ * Hence, it's not appropriate to apply a delay at the time.
+ */
+static void
+maybe_delay_apply(TimestampTz finish_ts)

That last sentence "Hence,... delay at the time" does not sound
correct. Is there a typo or missing words here?

Maybe it meant to say "... at the STREAM START time."?

~~~

9.
+ /* This mig

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-19 Thread David Geier


On 1/18/23 13:52, David Geier wrote:

On 1/16/23 21:39, Pavel Stehule wrote:


po 16. 1. 2023 v 21:34 odesílatel Tomas Vondra 
 napsal:


    Hi,

    there's minor bitrot in the Mkvcbuild.pm change, making cfbot 
unhappy.


    As for the patch, I don't have much comments. I'm wondering if 
it'd be

    useful to indicate which timing source was actually used for EXPLAIN
    ANALYZE, say something like:

     Planning time: 0.197 ms
     Execution time: 0.225 ms
     Timing source: clock_gettime (or tsc)

+1


I like the idea of exposing the timing source in the EXPLAIN ANALYZE 
output.
It's a good tradeoff between inspectability and effort, given that 
RDTSC should always be better to use.

If there are no objections I go this way.
Thinking about this a little more made me realize that this will cause 
different pg_regress output depending on the platform. So if we go this 
route we would at least need an option for EXPLAIN ANALYZE to disable 
it. Or rather have it disabled by default and allow for enabling it. 
Thoughts?


--
David Geier
(ServiceNow)

Re: Experiments with Postgres and SSL

2023-01-19 Thread Vladimir Sitnikov

>I don't think it's worth implementing a code path in
> the server like this as it would then become cruft that would be hard
> to ever get rid of.

Do you think the server can de-support the old code path soon?

> I think you can do the same thing, more or less, in the client. Like
> if the driver tries to connect via SSL and gets an error it remembers
> that host/port and connects using negotiation in the future.

Well, I doubt everybody would instantaneously upgrade to the database that
supports fast TLS,
so there will be a timeframe when there will be a lot of old databases, and
the clients will be new.
In that case, going with "try fast, ignore exception" would degrade
performance for old databases.

I see you suggest caching, however, "degrading one of the cases" might be
more painful than
"not improving one of the cases".

I would like to refrain from implementing "parallel connect both ways
and check which is faster" in
PG clients (e.g. https://en.wikipedia.org/wiki/Happy_Eyeballs ).

Just wondering: do you consider back-porting the feature to all supported
DB versions?

> In practice though, by the time drivers support this it'll probably be
> far enough in the future

I think drivers release more often than the database, and we can get driver
support even before the database releases.
I'm from pgjdbc Java driver team, and I think it is unfair to suggest that
"driver support is only far enough in the future".

Vladimir

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-19 Thread Masahiko Sawada

On Thu, Jan 19, 2023 at 2:41 PM Amit Kapila  wrote:
>
> On Wed, Jan 18, 2023 at 12:09 PM Amit Kapila  wrote:
> >
> > On Fri, Jan 13, 2023 at 11:50 AM Peter Smith  wrote:
> > >
> > > Here are some review comments for patch v79-0002.
> > >
> >
> > So, this is about the latest v84-0001-Stop-extra-worker-if-GUC-was-changed.
> >
> > >
> > > I feel this patch just adds more complexity for almost no gain:
> > > - reducing the 'max_apply_workers_per_suibscription' seems not very
> > > common in the first place.
> > > - even when the GUC is reduced, at that point in time all the workers
> > > might be in use so there may be nothing that can be immediately done.
> > > - IIUC the excess workers (for a reduced GUC) are going to get freed
> > > naturally anyway over time as more transactions are completed so the
> > > pool size will reduce accordingly.
> > >
> >
> > I am still not sure if it is worth pursuing this patch because of the
> > above reasons. I don't think it would be difficult to add this even at
> > a later point in time if we really see a use case for this.
> > Sawada-San, IIRC, you raised this point. What do you think?
> >
> > The other point I am wondering is whether we can have a different way
> > to test partial serialization apart from introducing another developer
> > GUC (stream_serialize_threshold). One possibility could be that we can
> > have a subscription option (parallel_send_timeout or something like
> > that) with some default value (current_timeout used in the patch)
> > which will be used only when streaming = parallel. Users may want to
> > wait for more time before serialization starts depending on the
> > workload (say when resource usage is high on a subscriber-side
> > machine, or there are concurrent long-running transactions that can
> > block parallel apply for a bit longer time). I know with this as well
> > it may not be straightforward to test the functionality because we
> > can't be sure how many changes would be required for a timeout to
> > occur. This is just for brainstorming other options to test the
> > partial serialization functionality.

I can see parallel_send_timeout idea could be useful somewhat but I'm
concerned users can tune this value properly. It's likely to indicate
something abnormal or locking issues if LA waits to write data to the
queue for more than 10s. Also, I think it doesn't make sense to allow
users to set this timeout to a very low value. If switching to partial
serialization mode early is useful in some cases, I think it's better
to provide it as a new mode, such as streaming = 'parallel-file' etc.

>
> Apart from the above, we can also have a subscription option to
> specify parallel_shm_queue_size (queue_size used to determine the
> queue between the leader and parallel worker) and that can be used for
> this purpose. Basically, configuring it to a smaller value can help in
> reducing the test time but still, it will not eliminate the need for
> dependency on timing we have to wait before switching to partial
> serialize mode. I think this can be used in production as well to tune
> the performance depending on workload.

A parameter for the queue size is interesting but I agree it will not
eliminate the need for dependency on timing.

>
> Yet another way is to use the existing parameter logical_decode_mode
> [1]. If the value of logical_decoding_mode is 'immediate', then we can
> immediately switch to partial serialize mode. This will eliminate the
> dependency on timing. The one argument against using this is that it
> won't be as clear as a separate parameter like
> 'stream_serialize_threshold' proposed by the patch but OTOH we already
> have a few parameters that serve a different purpose when used on the
> subscriber. For example, 'max_replication_slots' is used to define the
> maximum number of replication slots on the publisher and the maximum
> number of origins on subscribers. Similarly,
> wal_retrieve_retry_interval' is used for different purposes on
> subscriber and standby nodes.

Using the existing parameter makes sense to me. But if we use
logical_decoding_mode also on the subscriber, as Shveta Malik also
suggested, probably it's better to rename it so as not to confuse. For
example, logical_replication_mode or something.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-19 Thread Takamichi Osumi (Fujitsu)

Hi, Horiguchi-san and Amit-san


On Wednesday, November 9, 2022 3:41 PM Kyotaro Horiguchi 
 wrote:
> Using interval is not standard as this kind of parameters but it seems
> convenient. On the other hand, it's not great that the unit month introduces
> some subtle ambiguity.  This patch translates a month to 30 days but I'm not
> sure it's the right thing to do. Perhaps we shouldn't allow the units upper 
> than
> days.
In the past discussion, we talked about the merits to utilize the interval type.
On the other hand, now we are facing some incompatibility issues of parsing
between this time-delayed feature and physical replication's 
recovery_min_apply_delay.

For instance, the interval type can accept '600 m s h', '1d 10min' and '1m',
but the recovery_min_apply_delay makes the server failed to start by all of 
those.

Therefore, this would confuse users and I'm going to make the feature's input
compatible with recovery_min_apply_delay in the next version.


Best Regards,
Takamichi Osumi

Re: Stack overflow issue

2023-01-19 Thread Egor Chindyaskin




03.01.2023 22:45, Sascha Kuhl writes:

Great work. Max Stack depth is memory dependent? Processor dependent?
Hello! These situations are not specific to the x86_64 architecture, but 
also manifest themselves, for example, on aarch64 architecture.
For example this query, ran on aarch64, (n=100;printf "begin;"; for 
((i=1;i<=$n;i++)); do printf "savepoint s$i;"; done; printf "release 
s1;" ) | psql > /dev/null

crashed the server on the savepoint174617 with the following stacktrace:

Core was generated by `postgres: test test [local] 
SAVEPOINT '.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  AllocSetCheck (context=at address 0xe2397fe8>) at aset.c:1409

1409    {
(gdb) bt
#0  AllocSetCheck (context=at address 0xe2397fe8>) at aset.c:1409
#1  0xd78c38c4 in MemoryContextCheck (context=0xaaab39ee16a0) at 
mcxt.c:740
#2  0xd78c38dc in MemoryContextCheck (context=0xaaab39edf690) at 
mcxt.c:742
#3  0xd78c38dc in MemoryContextCheck (context=0xaaab39edd680) at 
mcxt.c:742
#4  0xd78c38dc in MemoryContextCheck (context=0xaaab39edb670) at 
mcxt.c:742
#5  0xd78c38dc in MemoryContextCheck (context=0xaaab39ed9660) at 
mcxt.c:742
#6  0xd78c38dc in MemoryContextCheck (context=0xaaab39ed7650) at 
mcxt.c:742
#7  0xd78c38dc in MemoryContextCheck (context=0xaaab39ed5640) at 
mcxt.c:742
#8  0xd78c38dc in MemoryContextCheck (context=0xaaab39ed3630) at 
mcxt.c:742
#9  0xd78c38dc in MemoryContextCheck (context=0xaaab39ed1620) at 
mcxt.c:742
#10 0xd78c38dc in MemoryContextCheck (context=0xaaab39ecf610) at 
mcxt.c:742

...
#174617 0xd78c38dc in MemoryContextCheck 
(context=0xe47994b0) at mcxt.c:742
#174618 0xd78c38dc in MemoryContextCheck 
(context=0xe476dcd0) at mcxt.c:742
#174619 0xd78c38dc in MemoryContextCheck 
(context=0xe46ead50) at mcxt.c:742

#174620 0xd76c7e24 in finish_xact_command () at postgres.c:2739
#174621 0xd76c55b8 in exec_simple_query 
(query_string=0xe46f0540 "savepoint s174617;") at postgres.c:1238
#174622 0xd76ca7a4 in PostgresMain (argc=1, argv=0xe2b96898, 
dbname=0xe471c098 "test", username=0xe471c078 "test") at 
postgres.c:4508
#174623 0xd75e263c in BackendRun (port=0xe4711470) at 
postmaster.c:4530
#174624 0xd75e1f70 in BackendStartup (port=0xe4711470) at 
postmaster.c:4252

#174625 0xd75dd4c0 in ServerLoop () at postmaster.c:1745
#174626 0xd75dcd3c in PostmasterMain (argc=3, 
argv=0xe46eacb0) at postmaster.c:1417
#174627 0xd74d462c in main (argc=3, argv=0xe46eacb0) at 
main.c:209

Re: Logical replication timeout problem

2023-01-19 Thread Peter Smith

On Fri, Jan 20, 2023 at 3:35 PM Amit Kapila  wrote:
>
> On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
> >
> > Here are some review comments for patch v3-0001.
> >
> > ==
> > src/backend/replication/logical/logical.c
> >
> > 3. forward declaration
> >
> > +/* update progress callback */
> > +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> > +ReorderBufferTXN *txn,
> > +ReorderBufferChange *change);
> >
> > I felt this function wrapper name was a bit misleading... AFAIK every
> > other wrapper really does just wrap their respective functions. But
> > this one seems a bit different because it calls the wrapped function
> > ONLY if some threshold is exceeded. IMO maybe this function could have
> > some name that conveys this better:
> >
> > e.g. update_progress_cb_wrapper_with_threshold
> >
>
> I am wondering whether it would be better to move the threshold logic
> to the caller. Previously this logic was inside the function because
> it was being invoked from multiple places but now that won't be the
> case. Also, then your concern about the name would also be addressed.
>
> >
> > ~
> >
> > 7b.
> > Would it be neater to just call OutputPluginUpdateProgress here instead?
> >
> > e.g.
> > BEFORE
> > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> > AFTER
> > OutputPluginUpdateProgress(ctx, false);
> >
>
> We already check whether ctx->update_progress is defined or not which
> is the only extra job done by OutputPluginUpdateProgress but probably
> we can consolidate the checks and directly invoke
> OutputPluginUpdateProgress.
>

Yes, I saw that, but I thought it was better to keep the early exit
from update_progress_cb_wrapper, so incurring just one additional
boolean check for every 100 changes was not anything to worry about.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Unicode grapheme clusters

2023-01-19 Thread Pavel Stehule

pá 20. 1. 2023 v 2:55 odesílatel Bruce Momjian  napsal:

> On Thu, Jan 19, 2023 at 07:53:43PM -0500, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > I am not sure what you are referring to above?  character_length?  I
> was
> > > talking about display length, and psql uses that --- at some point, our
> > > lack of support for graphemes will cause psql to not align columns.
> >
> > That's going to happen regardless, as long as we can't be sure
> > what the display will do with the characters --- and that's a
> > problem that will persist for a very long time.
> >
> > Ideally, yeah, it'd be great if all this stuff rendered perfectly;
> > but IMO it's so far outside mainstream usage of psql that it's
> > not something that could possibly repay the investment of time
> > to get even a partial solution.
>
> We have a few options:
>
> *  TODO item
> *  document psql works that way
> *  do nothing
>
> I think the big question is how common such cases will be in the future.
> The report from 2022, and one from 2019 didn't seem to clearly outline
> the issue so it would good to have something documented somewhere.
>

There can be a note in psql documentation like "Unicode grapheme clusters
are not supported yet. It is not well supported by other necessary software
like terminal emulators and curses libraries".

I partially watch an progres in VTE - one of the widely used terminal libs,
and I am very sceptical so there will be support in the next two years.

Maybe the new microsoft terminal will give this area a new dynamic, but
currently only few people on the planet are working on fixing or enhancing
terminal's technologies. Unfortunately there is too much historical balast.

Regards

Pavel


> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
> Embrace your flaws.  They make you human, rather than perfect,
> which you will never be.
>
>
>

Re: Adjust the description of OutputPluginCallbacks in pg-doc

2023-01-19 Thread Amit Kapila

On Fri, Jan 20, 2023 at 8:03 AM wangw.f...@fujitsu.com
 wrote:
>
> On Thurs, Jan 19, 2023 at 19:18 PM Amit Kapila  
> wrote:
> > On Wed, Jan 11, 2023 at 4:20 PM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > When I was reading the "Logical Decoding Output Plugins" chapter in pg-doc
> > [1],
> > > I think in the summary section, only the callback message_cb is not 
> > > described
> > > whether it is required or optional, and the description of callback
> > > stream_prepare_cb seems inaccurate.
> > >
> > > And after the summary section, I think only the callback stream_xxx_cb
> > section
> > > and the callback truncate_cb section are not described this tag (are they
> > > required or optional).
> > >
> > > I think we could improve these to be more reader friendly. So I tried to 
> > > write
> > > a patch for these and attach it.
> > >
> > > Any thoughts?
> > >
> >
> > This looks mostly good to me. I have made minor adjustments in the
> > attached. Do those make sense to you?
>
> Thanks for your improvement.
> It makes sense to me.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: Generating code for query jumbling through gen_node_support.pl

2023-01-19 Thread Michael Paquier

On Thu, Jan 19, 2023 at 09:42:03AM +0100, Peter Eisentraut wrote:
> I see that in the 0003 patch, most location fields now have an explicit
> markup with query_jumble_ignore.  I thought we had previously resolved to
> consider location fields to be automatically ignored unless explicitly
> included (like for the Const node).  This appears to invert that?  Am I
> missing something?

As a result, I have rebased the patch set to use the two-attribute
approach: query_jumble_ignore and query_jumble_location.

On top of the three previous patches, I am adding 0004 to implement a
GUC able to switch the computation of the utility statements between
what I am calling "string" to compute the query IDs based on the hash
of the query string and the previous default, or "jumble", to use the
parsed tree, with a few more tests to see the difference.  Perhaps it
is not worth bothering, but it could be possible that some users don't
want to pay the penalty of doing the query jumbling with the parsed
tree for utilities, as well..
--
Michael
From 43a5bdffdb9dbe3ecb40e9bf8a5f0e9f12ceff32 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 18 Jan 2023 14:26:12 +0900
Subject: [PATCH v4 1/4] Rework format of comments for nodes

This is similar to 835d476, except that this one is to add node
properties related to query jumbling.
---
 src/include/nodes/parsenodes.h | 261 --
 src/include/nodes/plannodes.h  |   3 +-
 src/include/nodes/primnodes.h  | 469 ++---
 3 files changed, 487 insertions(+), 246 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index cfeca96d53..eec51e3ee2 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -123,7 +123,8 @@ typedef struct Query
 
 	CmdType		commandType;	/* select|insert|update|delete|merge|utility */
 
-	QuerySource querySource;	/* where did I come from? */
+	/* where did I come from? */
+	QuerySource querySource;
 
 	/*
 	 * query identifier (can be set by plugins); ignored for equal, as it
@@ -131,39 +132,58 @@ typedef struct Query
 	 */
 	uint64		queryId pg_node_attr(equal_ignore, read_write_ignore, read_as(0));
 
-	bool		canSetTag;		/* do I set the command result tag? */
+	/* do I set the command result tag? */
+	bool		canSetTag;
 
 	Node	   *utilityStmt;	/* non-null if commandType == CMD_UTILITY */
 
-	int			resultRelation; /* rtable index of target relation for
- * INSERT/UPDATE/DELETE/MERGE; 0 for SELECT */
+	/*
+	 * rtable index of target relation for INSERT/UPDATE/DELETE/MERGE; 0 for
+	 * SELECT.
+	 */
+	int			resultRelation;
 
-	bool		hasAggs;		/* has aggregates in tlist or havingQual */
-	bool		hasWindowFuncs; /* has window functions in tlist */
-	bool		hasTargetSRFs;	/* has set-returning functions in tlist */
-	bool		hasSubLinks;	/* has subquery SubLink */
-	bool		hasDistinctOn;	/* distinctClause is from DISTINCT ON */
-	bool		hasRecursive;	/* WITH RECURSIVE was specified */
-	bool		hasModifyingCTE;	/* has INSERT/UPDATE/DELETE in WITH */
-	bool		hasForUpdate;	/* FOR [KEY] UPDATE/SHARE was specified */
-	bool		hasRowSecurity; /* rewriter has applied some RLS policy */
-
-	bool		isReturn;		/* is a RETURN statement */
+	/* has aggregates in tlist or havingQual */
+	bool		hasAggs;
+	/* has window functions in tlist */
+	bool		hasWindowFuncs;
+	/* has set-returning functions in tlist */
+	bool		hasTargetSRFs;
+	/* has subquery SubLink */
+	bool		hasSubLinks;
+	/* distinctClause is from DISTINCT ON */
+	bool		hasDistinctOn;
+	/* WITH RECURSIVE was specified */
+	bool		hasRecursive;
+	/* has INSERT/UPDATE/DELETE in WITH */
+	bool		hasModifyingCTE;
+	/* FOR [KEY] UPDATE/SHARE was specified */
+	bool		hasForUpdate;
+	/* rewriter has applied some RLS policy */
+	bool		hasRowSecurity;
+	/* is a RETURN statement */
+	bool		isReturn;
 
 	List	   *cteList;		/* WITH list (of CommonTableExpr's) */
 
 	List	   *rtable;			/* list of range table entries */
-	List	   *rteperminfos;	/* list of RTEPermissionInfo nodes for the
- * rtable entries having perminfoindex > 0 */
+
+	/*
+	 * list of RTEPermissionInfo nodes for the rtable entries having
+	 * perminfoindex > 0
+	 */
+	List	   *rteperminfos;
 	FromExpr   *jointree;		/* table join tree (FROM and WHERE clauses);
  * also USING clause for MERGE */
 
 	List	   *mergeActionList;	/* list of actions for MERGE (only) */
-	bool		mergeUseOuterJoin;	/* whether to use outer join */
+	/* whether to use outer join */
+	bool		mergeUseOuterJoin;
 
 	List	   *targetList;		/* target list (of TargetEntry) */
 
-	OverridingKind override;	/* OVERRIDING clause */
+	/* OVERRIDING clause */
+	OverridingKind override;
 
 	OnConflictExpr *onConflict; /* ON CONFLICT DO [NOTHING | UPDATE] */
 
@@ -191,11 +211,14 @@ typedef struct Query
 	Node	   *setOperations;	/* set-operation tree if this is top level of
  * a UNION/INTERSECT/EXCEPT query */
 
-	List	   *constraintDeps; /* a list of pg_constraint OIDs that the query
-

Re: Logical replication timeout problem

2023-01-19 Thread Amit Kapila

On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
>
> Here are some review comments for patch v3-0001.
>
> ==
> src/backend/replication/logical/logical.c
>
> 3. forward declaration
>
> +/* update progress callback */
> +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> +ReorderBufferTXN *txn,
> +ReorderBufferChange *change);
>
> I felt this function wrapper name was a bit misleading... AFAIK every
> other wrapper really does just wrap their respective functions. But
> this one seems a bit different because it calls the wrapped function
> ONLY if some threshold is exceeded. IMO maybe this function could have
> some name that conveys this better:
>
> e.g. update_progress_cb_wrapper_with_threshold
>

I am wondering whether it would be better to move the threshold logic
to the caller. Previously this logic was inside the function because
it was being invoked from multiple places but now that won't be the
case. Also, then your concern about the name would also be addressed.

>
> ~
>
> 7b.
> Would it be neater to just call OutputPluginUpdateProgress here instead?
>
> e.g.
> BEFORE
> ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> AFTER
> OutputPluginUpdateProgress(ctx, false);
>

We already check whether ctx->update_progress is defined or not which
is the only extra job done by OutputPluginUpdateProgress but probably
we can consolidate the checks and directly invoke
OutputPluginUpdateProgress.

-- 
With Regards,
Amit Kapila.

Re: generic plans and "initial" pruning

2023-01-19 Thread Tom Lane

Amit Langote  writes:
> On Fri, Jan 20, 2023 at 12:31 PM Tom Lane  wrote:
>> It might be possible to incorporate this pointer into PlannedStmt
>> instead of passing it separately.

> Yeah, that would be less churn.  Though, I wonder if you still hold
> that PlannedStmt should not be scribbled upon outside the planner as
> you said upthread [1]?

Well, the whole point of that rule is that the executor can't modify
a plancache entry.  If the plancache itself sets a field in such an
entry, that doesn't seem problematic from here.

But there's other possibilities if that bothers you; QueryDesc
could hold the field, for example.  Also, I bet we'd want to copy
it into EState for the main initialization recursion.

regards, tom lane

Re: generic plans and "initial" pruning

2023-01-19 Thread Amit Langote

On Fri, Jan 20, 2023 at 12:31 PM Tom Lane  wrote:
> Amit Langote  writes:
> > On Fri, Jan 20, 2023 at 4:39 AM Tom Lane  wrote:
> >> I had what felt like an epiphany: the whole problem arises because the
> >> system is wrongly factored.  We should get rid of AcquireExecutorLocks
> >> altogether, allowing the plancache to hand back a generic plan that
> >> it's not certain of the validity of, and instead integrate the
> >> responsibility for acquiring locks into executor startup.
>
> > Interesting.  The current implementation relies on
> > PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
> > perhaps there will have to be some sharing of state between the
> > plancache and the executor for this to work?
>
> Yeah.  Thinking a little harder, I think this would have to involve
> passing a CachedPlan pointer to the executor, and what the executor
> would do after acquiring each lock is to ask the plancache "hey, do
> you still think this CachedPlan entry is valid?".  In the case where
> there's a problem, the AcceptInvalidationMessages call involved in
> lock acquisition would lead to a cache inval that clears the validity
> flag on the CachedPlan entry, and this would provide an inexpensive
> way to check if that happened.

OK, thanks, this is useful.

> It might be possible to incorporate this pointer into PlannedStmt
> instead of passing it separately.

Yeah, that would be less churn.  Though, I wonder if you still hold
that PlannedStmt should not be scribbled upon outside the planner as
you said upthread [1]?

> >> * In a successfully built execution state tree, there will simply
> >> not be any nodes corresponding to pruned-away, never-locked subplans.
>
> > I think this is true with the patch as proposed too, but I was still a
> > bit worried about what an ExecutorStart_hook may be doing with an
> > uninitialized plan tree.  Maybe we're mandating that the hook must
> > call standard_ExecutorStart() and only work with the finished
> > PlanState tree?
>
> It would certainly be incumbent on any such hook to not touch
> not-yet-locked parts of the plan tree.  I'm not particularly concerned
> about that sort of requirements change, because we'd be breaking APIs
> all through this area in any case.

OK.  Perhaps something that should be documented around ExecutorStart().

> >> * In some cases (views, at least) we need to acquire lock on relations
> >> that aren't directly reflected anywhere in the plan tree.  So there'd
> >> have to be a separate mechanism for getting those locks and rechecking
> >> validity afterward.  A list of relevant relation OIDs might be enough
> >> for that.
>
> > Hmm, a list of only the OIDs wouldn't preserve the lock mode,
>
> Good point.  I wonder if we could integrate this with the
> RTEPermissionInfo data structure?

You mean adding a rellockmode field to RTEPermissionInfo?

> > Would you like me to hack up a PoC or are you already on that?
>
> I'm not planning to work on this myself, I was hoping you would.

Alright, I'll try to get something out early next week.  Thanks for
all the pointers.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/922566.1648784745%40sss.pgh.pa.us

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-19 Thread shveta malik

On Thu, Jan 19, 2023 at 12:42 PM Takamichi Osumi (Fujitsu)
 wrote:
>
> On Wednesday, January 18, 2023 4:06 PM Peter Smith  
> wrote:
> >  Here are my review comments for the latest patch v16-0001. (excluding the
> > test code)
> Hi, thank you for your review !
>
> > ==
> >
> > General
> >
> > 1.
> >
> > Since the value of min_apply_delay cannot be < 0,  I was thinking probably 
> > it
> > should have been declared everywhere in this patch as a
> > uint64 instead of an int64, right?
> No, we won't be able to adopt this idea.
>
> It seems that we are not able to use uint for catalog type.
> So, can't applying it to the pg_subscription.h definitions
> and then similarly Int64GetDatum to store catalog variables
> and the argument variable of Int64GetDatum.
>
> Plus, there is a possibility that type Interval becomes negative value,
> then we are not able to change the int64 variable to get
> the return value of interval2ms().
>
> > ==
> >
> > Commit message
> >
> > 2.
> >
> > If the subscription sets min_apply_delay parameter, the logical replication
> > worker will delay the transaction commit for min_apply_delay milliseconds.
> >
> > ~
> >
> > IMO there should be another sentence before this just to say that a new
> > parameter is being added:
> >
> > e.g.
> > This patch implements a new subscription parameter called
> > 'min_apply_delay'.
> Added.
>
>
> > ==
> >
> > doc/src/sgml/config.sgml
> >
> > 3.
> >
> > +  
> > +   For time-delayed logical replication, the apply worker sends a 
> > Standby
> > +   Status Update message to the corresponding publisher per the
> > indicated
> > +   time of this parameter. Therefore, if this parameter is longer than
> > +   wal_sender_timeout on the publisher, then the
> > +   walsender doesn't get any update message during the delay and
> > repeatedly
> > +   terminates due to the timeout errors. Hence, make sure this 
> > parameter
> > is
> > +   shorter than the wal_sender_timeout of the
> > publisher.
> > +   If this parameter is set to zero with time-delayed replication, the
> > +   apply worker doesn't send any feedback messages during the
> > +   min_apply_delay.
> > +  
> >
> >
> > This paragraph seemed confusing. I think it needs to be reworded to change 
> > all
> > of the "this parameter" references because there are at least 3 different
> > parameters mentioned in this paragraph. e.g. maybe just change them to
> > explicitly name the parameter you are talking about.
> >
> > I also think it needs to mention the ‘min_apply_delay’ subscription 
> > parameter
> > up-front and then refer to it appropriately.
> >
> > The end result might be something like I wrote below (this is just my guess 
> > ?
> > probably you can word it better).
> >
> > SUGGESTION
> > For time-delayed logical replication (i.e. when the subscription is created 
> > with
> > parameter min_apply_delay > 0), the apply worker sends a Standby Status
> > Update message to the publisher with a period of 
> > wal_receiver_status_interval .
> > Make sure to set wal_receiver_status_interval less than the
> > wal_sender_timeout on the publisher, otherwise, the walsender will 
> > repeatedly
> > terminate due to the timeout errors. If wal_receiver_status_interval is set 
> > to zero,
> > the apply worker doesn't send any feedback messages during the subscriber’s
> > min_apply_delay period.
> Applied. Also, I added one reference for min_apply_delay parameter
> at the end of this description.
>
>
> > ==
> >
> > doc/src/sgml/ref/create_subscription.sgml
> >
> > 4.
> >
> > + 
> > +  By default, the subscriber applies changes as soon as possible. 
> > As
> > +  with the physical replication feature
> > +  (), it can be
> > useful to
> > +  have a time-delayed logical replica. This parameter lets the 
> > user to
> > +  delay the application of changes by a specified amount of
> > time. If this
> > +  value is specified without units, it is taken as milliseconds. 
> > The
> > +  default is zero(no delay).
> > + 
> >
> > 4a.
> > As with the physical replication feature (recovery_min_apply_delay), it can 
> > be
> > useful to have a time-delayed logical replica.
> >
> > IMO not sure that the above sentence is necessary. It seems only to be 
> > saying
> > that this parameter can be useful. Why do we need to say that?
> Removed the sentence.
>
>
> > ~
> >
> > 4b.
> > "This parameter lets the user to delay" -> "This parameter lets the user 
> > delay"
> > OR
> > "This parameter lets the user to delay" -> "This parameter allows the user 
> > to
> > delay"
> Fixed.
>
>
> > ~
> >
> > 4c.
> > "If this value is specified without units" -> "If the value is specified 
> > without
> > units"
> Fixed.
>
> > ~
> >
> > 4d.
> > "zero(no delay)." -> "zero (no delay)."
> Fixed.
>
> > 
> >
> > 5.
> >
> > + 
> > +  The delay occurs only on WAL records for transaction begins and

Re: Record queryid when auto_explain.log_verbose is on

2023-01-19 Thread Michael Paquier

On Fri, Jan 20, 2023 at 11:43:51AM +0900, torikoshia wrote:
> Sorry to make you go through the trouble of looking for it.
> I've now created it.
> https://commitfest.postgresql.org/42/4136/

FWIW, no objections from here.  This maps with EXPLAIN where the query
ID is only printed under VERBOSE.
--
Michael


signature.asc
Description: PGP signature

Re: generic plans and "initial" pruning

2023-01-19 Thread Tom Lane

Amit Langote  writes:
> On Fri, Jan 20, 2023 at 4:39 AM Tom Lane  wrote:
>> I had what felt like an epiphany: the whole problem arises because the
>> system is wrongly factored.  We should get rid of AcquireExecutorLocks
>> altogether, allowing the plancache to hand back a generic plan that
>> it's not certain of the validity of, and instead integrate the
>> responsibility for acquiring locks into executor startup.

> Interesting.  The current implementation relies on
> PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
> perhaps there will have to be some sharing of state between the
> plancache and the executor for this to work?

Yeah.  Thinking a little harder, I think this would have to involve
passing a CachedPlan pointer to the executor, and what the executor
would do after acquiring each lock is to ask the plancache "hey, do
you still think this CachedPlan entry is valid?".  In the case where
there's a problem, the AcceptInvalidationMessages call involved in
lock acquisition would lead to a cache inval that clears the validity
flag on the CachedPlan entry, and this would provide an inexpensive
way to check if that happened.

It might be possible to incorporate this pointer into PlannedStmt
instead of passing it separately.

>> * In a successfully built execution state tree, there will simply
>> not be any nodes corresponding to pruned-away, never-locked subplans.

> I think this is true with the patch as proposed too, but I was still a
> bit worried about what an ExecutorStart_hook may be doing with an
> uninitialized plan tree.  Maybe we're mandating that the hook must
> call standard_ExecutorStart() and only work with the finished
> PlanState tree?

It would certainly be incumbent on any such hook to not touch
not-yet-locked parts of the plan tree.  I'm not particularly concerned
about that sort of requirements change, because we'd be breaking APIs
all through this area in any case.

>> * In some cases (views, at least) we need to acquire lock on relations
>> that aren't directly reflected anywhere in the plan tree.  So there'd
>> have to be a separate mechanism for getting those locks and rechecking
>> validity afterward.  A list of relevant relation OIDs might be enough
>> for that.

> Hmm, a list of only the OIDs wouldn't preserve the lock mode,

Good point.  I wonder if we could integrate this with the
RTEPermissionInfo data structure?

> Would you like me to hack up a PoC or are you already on that?

I'm not planning to work on this myself, I was hoping you would.

regards, tom lane

Re: generic plans and "initial" pruning

2023-01-19 Thread Amit Langote

On Fri, Jan 20, 2023 at 4:39 AM Tom Lane  wrote:
> I spent some time re-reading this whole thread, and the more I read
> the less happy I got.

Thanks a lot for your time on this.

>  We are adding a lot of complexity and introducing
> coding hazards that will surely bite somebody someday.  And after awhile
> I had what felt like an epiphany: the whole problem arises because the
> system is wrongly factored.  We should get rid of AcquireExecutorLocks
> altogether, allowing the plancache to hand back a generic plan that
> it's not certain of the validity of, and instead integrate the
> responsibility for acquiring locks into executor startup.  It'd have
> to be optional there, since we don't need new locks in the case of
> executing a just-planned plan; but we can easily add another eflags
> bit (EXEC_FLAG_GET_LOCKS or so).  Then there has to be a convention
> whereby the ExecInitNode traversal can return an indicator that
> "we failed because the plan is stale, please make a new plan".

Interesting.  The current implementation relies on
PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
perhaps there will have to be some sharing of state between the
plancache and the executor for this to work?

> There are a couple reasons why this feels like a good idea:
>
> * There's no need for worry about keeping the locking decisions in sync
> with what executor startup does.
>
> * We don't need to add the overhead proposed in the current patch to
> pass forward data about what got locked/pruned.  While that overhead
> is hopefully less expensive than the locks it saved acquiring, it's
> still overhead (and in some cases the patch will fail to save acquiring
> any locks, making it certainly a net negative).
>
> * In a successfully built execution state tree, there will simply
> not be any nodes corresponding to pruned-away, never-locked subplans.
> As long as code like EXPLAIN follows the state tree and doesn't poke
> into plan nodes that have no matching state, it's secure against the
> sort of problems that Robert worried about upthread.

I think this is true with the patch as proposed too, but I was still a
bit worried about what an ExecutorStart_hook may be doing with an
uninitialized plan tree.  Maybe we're mandating that the hook must
call standard_ExecutorStart() and only work with the finished
PlanState tree?

> While I've not attempted to write any code for this, I can also
> think of a few issues that'd have to be resolved:
>
> * We'd be pushing the responsibility for looping back and re-planning
> out to fairly high-level calling code.  There are only half a dozen
> callers of GetCachedPlan, so there's not that many places to be
> touched; but in some of those places the subsequent executor-start call
> is not close by, so that the necessary refactoring might be pretty
> painful.  I doubt there's anything insurmountable, but we'd definitely
> be changing some fundamental APIs.

Yeah.  I suppose mostly the same place that the current patch is
touching to pass around the PartitionPruneResult nodes.

> * In some cases (views, at least) we need to acquire lock on relations
> that aren't directly reflected anywhere in the plan tree.  So there'd
> have to be a separate mechanism for getting those locks and rechecking
> validity afterward.  A list of relevant relation OIDs might be enough
> for that.

Hmm, a list of only the OIDs wouldn't preserve the lock mode, so maybe
a list or bitmapset of the RTIs, something along the lines of
PlannedStmt.minLockRelids in the patch?

It perhaps even makes sense to make a special list in PlannedStmt for
only the views?

> * We currently do ExecCheckPermissions() before initializing the
> plan state tree.  It won't do to check permissions on relations we
> haven't yet locked, so that responsibility would have to be moved.
> Maybe that could also be integrated into the initialization recursion?
> Not sure.

Ah, I remember mentioning moving that into ExecGetRangeTableRelation()
[1], but I guess that misses relations that are not referenced in the
plan tree, such as views.  Though maybe that's not a problem if we
track views separately as mentioned above.

> * In the existing usage of AcquireExecutorLocks, if we do decide
> that the plan is stale then we are able to release all the locks
> we got before we go off and replan.  I'm not certain if that behavior
> needs to be preserved, but if it does then that would require some
> additional bookkeeping in the executor.

I think maybe we'll want to continue to release the existing locks,
because if we don't, it's possible we may keep some locks uselessly if
replanning might lock a different set of relations.

> * This approach is optimizing on the assumption that we usually
> won't need to replan, because if we do then we might waste a fair
> amount of executor startup overhead before discovering we have
> to throw all that state away.  I think that's clearly the right
> way to bet, but perhaps somebody else has a dif

Re: Replace PROC_QUEUE / SHM_QUEUE with ilist.h

2023-01-19 Thread Andres Freund

Hi,

On 2022-12-03 10:17:22 -0800, Andres Freund wrote:
> On 2022-11-19 21:59:30 -0800, Andres Freund wrote:
> > In [1] Robert justifiably complained about the use of PROC_QUEUE. I've
> > previously been bothered by this in [2], but didn't get around to finishing
> > the patches.
> > 
> > One thing that made me hesitate was the naming of the function for a) 
> > checking
> > whether a dlist_node is a list, b) initializing and deleting nodes in a way 
> > to
> > allow for a).  My patch adds dlist_node_init(), dlist_delete_thoroughly() /
> > dlist_delete_from_thoroughly() / dclist_delete_from_thoroughly() and
> > dlist_node_is_detached().  Thomas proposed dlist_delete_and_reinit() and
> > dlist_node_is_linked() instead.
> 
> Any comments on these names? Otherwise I'm planning to push ahead with the
> names as is, lest I forget this patchset for another ~2 years.

Finally pushed, with some fairly minor additional cleanup. No more SHM_QUEUE,
yay!

Also, predicate.c really needs some love.

Greetings,

Andres Freund

Re: Record queryid when auto_explain.log_verbose is on

2023-01-19 Thread torikoshia


On 2023-01-19 19:05, Julien Rouhaud wrote:

Hi,

On Tue, Jan 17, 2023 at 10:53:23PM +0900, torikoshia wrote:

>
> For interactive EXPLAIN the query identifier is printed just after the
> plan,
> before the triggers and the JIT summary so auto_explain should do the
> same.
Thanks for the comment!
Agreed and updated the patch.


Thanks!


On 2023-01-17 03:53, Justin Pryzby wrote:
> On Mon, Jan 16, 2023 at 09:36:59PM +0900, torikoshia wrote:
> > Attached patch makes auto_explain also print query identifiers.
>
> This was asked during the initial patch; does your patch address the
> issues here ?
>
> https://www.postgresql.org/message-id/20200308142644.vlihk7djpwqjkp7w%40nol

Thanks!
I may misunderstand something, but it seems that the issue occurred 
since

queryid was calculated in pgss_post_parse_analyze() at that time.

```
--- queryid_exposure-v6.diff, which is the patch just before the 
discussion
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, 
Query

*query)
..snip..

if (query->utilityStmt)
{
-   query->queryId = UINT64CONST(0);
+   if (pgss_track_utility && 
PGSS_HANDLED_UTILITY(query->utilityStmt)

+   && pstate->p_sourcetext)
+   {
+   const char *querytext = pstate->p_sourcetext;
+   int query_location = query->stmt_location;
+   int query_len = query->stmt_len;
+
+   /*
+* Confine our attention to the relevant part of the 
string, if

the
+* query is a portion of a multi-statement source string.
+*/
+   querytext = pgss_clean_querytext(pstate->p_sourcetext,
+&query_location,
+&query_len);
+
+   query->queryId = pgss_compute_utility_queryid(querytext,
query_len);
```

Currently queryId is not calculated in pgss_post_parse_analyze() and 
the

issue does not occur, doesn't it?
I confirmed the attached patch logged queryid for some utility 
statements.


```
LOG:  0: duration: 0.017 ms  plan:
Query Text: prepare p1 as select 1;
Result  (cost=0.00..0.01 rows=1 width=4)
  Output: 1
Query Identifier: -1801652217649936326
```


Yes, this problem was fixed a long time ago.  Just to be sure I checked 
that

auto_explain and explain agree on the queryid:


Thanks for your comment and check!


=# select count(*) from pg_class;
[...]
LOG:  duration: 0.239 ms  plan:
Query Text: select count(*) from pg_class;
Aggregate  (cost=15.45..15.46 rows=1 width=8)
  Output: count(*)
  ->  Index Only Scan using pg_class_tblspc_relfilenode_index on
pg_catalog.pg_class  (cost=0.15..14.40 rows=417 width=0)
Output: reltablespace, relfilenode
Query Identifier: 2855866587085353326

=# explain (verbose) select count(*) from pg_class;
QUERY PLAN
  >
->
 Aggregate  (cost=15.45..15.46 rows=1 width=8)
   Output: count(*)
   ->  Index Only Scan using pg_class_tblspc_relfilenode_index on
pg_catalog.pg_class  (cost=0.15..14.40 rows>
 Output: reltablespace, relfilenode
 Query Identifier: 2855866587085353326

LOG:  duration: 0.000 ms  plan:
Query Text: explain (verbose) select count(*) from pg_class;
Aggregate  (cost=15.45..15.46 rows=1 width=8)
  Output: count(*)
  ->  Index Only Scan using pg_class_tblspc_relfilenode_index on
pg_catalog.pg_class  (cost=0.15..14.40 rows=417 width=0)
Output: reltablespace, relfilenode
Query Identifier: 2855866587085353326

So the patch looks good to me.  I didn't find any entry in the 
commitfest, did
I miss it?  If not, could you create one (feel free to add me and 
Justin as

reviewer, and probably mark is as RfC).


Sorry to make you go through the trouble of looking for it.
I've now created it.
https://commitfest.postgresql.org/42/4136/



It's a bit annoying that the info is missing since pg 14, but we 
probably can't

backpatch this as it might break log parser tools.


+1

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

RE: Adjust the description of OutputPluginCallbacks in pg-doc

2023-01-19 Thread wangw.f...@fujitsu.com

On Thurs, Jan 19, 2023 at 19:18 PM Amit Kapila  wrote:
> On Wed, Jan 11, 2023 at 4:20 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > When I was reading the "Logical Decoding Output Plugins" chapter in pg-doc
> [1],
> > I think in the summary section, only the callback message_cb is not 
> > described
> > whether it is required or optional, and the description of callback
> > stream_prepare_cb seems inaccurate.
> >
> > And after the summary section, I think only the callback stream_xxx_cb
> section
> > and the callback truncate_cb section are not described this tag (are they
> > required or optional).
> >
> > I think we could improve these to be more reader friendly. So I tried to 
> > write
> > a patch for these and attach it.
> >
> > Any thoughts?
> >
> 
> This looks mostly good to me. I have made minor adjustments in the
> attached. Do those make sense to you?

Thanks for your improvement.
It makes sense to me.

Regards,
Wang Wei

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-01-19 Thread Michael Paquier

On Thu, Jan 19, 2023 at 12:23:16PM +0100, Jelte Fennema wrote:
> should be either:
> 1. a group membership check
> 2. group membership checks
> 
> Now it's mixed singular and plural.

Thanks, fixed.  And now applied the last patch.
--
Michael


signature.asc
Description: PGP signature

Re: Logical replication timeout problem

2023-01-19 Thread Peter Smith

Here are some review comments for patch v3-0001.

==
Commit message

1.
The problem is when there is a DDL in a transaction that generates lots of
temporary data due to rewrite rules, these temporary data will not be processed
by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no
impact on this case.

~

1a.
IMO this comment needs to give a bit of background about the original
problem here, rather than just starting with "The problem is" which is
describing the flaws of the previous fix.

~

1b.
"pgoutput - plugin" -> "pgoutput plugin" ??

~~~

2.

To fix this, we introduced a new ReorderBuffer callback -
'ReorderBufferUpdateProgressCB'. This callback is called to try to update the
process after each change has been processed during sending data of a
transaction (and its subtransactions) to the output plugin.

IIUC it's not really "after each change" - shouldn't this comment
mention something about the CHANGES_THRESHOLD 100?

==
src/backend/replication/logical/logical.c

3. forward declaration

+/* update progress callback */
+static void update_progress_cb_wrapper(ReorderBuffer *cache,
+ReorderBufferTXN *txn,
+ReorderBufferChange *change);

I felt this function wrapper name was a bit misleading... AFAIK every
other wrapper really does just wrap their respective functions. But
this one seems a bit different because it calls the wrapped function
ONLY if some threshold is exceeded. IMO maybe this function could have
some name that conveys this better:

e.g. update_progress_cb_wrapper_with_threshold

~~~

4. update_progress_cb_wrapper

+/*
+ * Update progress callback
+ *
+ * Try to update progress and send a keepalive message if too many changes were
+ * processed when processing txn.
+ *
+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time (exceeds the wal_receiver_timeout of standby) then it can timeout.
+ * This can happen when all or most of the changes are either not published or
+ * got filtered out.
+ */

SUGGESTION (instead of the "Try to update" sentence)
Send a keepalive message whenever more than 
changes are encountered while processing a transaction.

~~~

5.

+static void
+update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ReorderBufferChange *change)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ static int changes_count = 0; /* Static variable used to accumulate
+ * the number of changes while
+ * processing txn. */
+

IMO this may be more readable if the static 'changes_count' local var
was declared first and separated from the other vars by a blank line.

~~~

6.

+ /*
+ * We don't want to try sending a keepalive message after processing each
+ * change as that can have overhead. Tests revealed that there is no
+ * noticeable overhead in doing it after continuously processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100

6a.
I think it might be better to define this right at the top of the
function adjacent to the 'changes_count' variable (e.g. a bit like the
original HEAD code looked)

~

6b.
SUGGESTION (for the comment)
Sending keepalive messages after every change has some overhead, but
testing showed there is no noticeable overhead if keepalive is only
sent after every ~100 changes.

~~~

7.

+
+ /*
+ * After continuously processing CHANGES_THRESHOLD changes, we
+ * try to send a keepalive message if required.
+ */
+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
+ changes_count = 0;
+ }
+

7a.
SUGGESTION (for comment)
Send a keepalive message after every CHANGES_THRESHOLD changes.

~

7b.
Would it be neater to just call OutputPluginUpdateProgress here instead?

e.g.
BEFORE
ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
AFTER
OutputPluginUpdateProgress(ctx, false);

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: pgindent vs variable declaration across multiple lines

2023-01-19 Thread Tom Lane

Andres Freund  writes:
> On 2023-01-19 20:43:44 -0500, Tom Lane wrote:
>> What reindent-all-branches pain?  We haven't done an all-branches
>> reindent in the past, even for pgindent fixes that touched far more
>> code than this would (assuming that the proposed fix doesn't have
>> other side-effects).

> Oh. I thought we had re-indented the other branches when we modified
> pg_bsd_intent substantially in the past, to reduce backpatching pain. But I
> guess we just discussed that option, but didn't end up pursuing it.

Yeah, we did discuss it, but never did it --- I think the convincing
argument not to was that major reformatting would be very painful
for people maintaining forks, and we shouldn't put them through that
to track minor releases.

regards, tom lane

Re: pgindent vs variable declaration across multiple lines

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 20:43:44 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > There's a few places in the code that try to format a variable definition 
> > like this
> 
> > ReorderBufferChange *next_change =
> > dlist_container(ReorderBufferChange, node, next);
> 
> > but pgindent turns that into
> 
> > ReorderBufferChange *next_change =
> > dlist_container(ReorderBufferChange, node, next);
> 
> Yeah, that's bugged me too.  I suspect that the triggering factor is
> use of a typedef name within the assigned expression, but I've not
> tried to run it to ground.

It's not that - it happens even with just
int frak =
1;

since it doesn't happen for plain assignments, I think it's somehow related to
code dealing with variable declarations.


> > I assume we'd again have to dive into pg_bsd_indent's code to fix it :(
> 
> Yeah :-(.  That's enough of a rat's nest that I've not really wanted to.
> But I'd support applying such a fix if someone can figure it out.

It's pretty awful code :(


> > And even if we were to figure out how, would it be worth the
> > reindent-all-branches pain? I'd say yes, but...
> 
> What reindent-all-branches pain?  We haven't done an all-branches
> reindent in the past, even for pgindent fixes that touched far more
> code than this would (assuming that the proposed fix doesn't have
> other side-effects).

Oh. I thought we had re-indented the other branches when we modified
pg_bsd_intent substantially in the past, to reduce backpatching pain. But I
guess we just discussed that option, but didn't end up pursuing it.

Greetings,

Andres Freund

Re: Unicode grapheme clusters

2023-01-19 Thread Bruce Momjian

On Thu, Jan 19, 2023 at 07:53:43PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > I am not sure what you are referring to above?  character_length?  I was
> > talking about display length, and psql uses that --- at some point, our
> > lack of support for graphemes will cause psql to not align columns.
> 
> That's going to happen regardless, as long as we can't be sure
> what the display will do with the characters --- and that's a
> problem that will persist for a very long time.
> 
> Ideally, yeah, it'd be great if all this stuff rendered perfectly;
> but IMO it's so far outside mainstream usage of psql that it's
> not something that could possibly repay the investment of time
> to get even a partial solution.

We have a few options:

*  TODO item
*  document psql works that way
*  do nothing

I think the big question is how common such cases will be in the future.
The report from 2022, and one from 2019 didn't seem to clearly outline
the issue so it would good to have something documented somewhere.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Non-superuser subscription owners

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 17:16:20 -0800, Jeff Davis wrote:
> The predefined role is probably the biggest user-facing part of the
> change. Does it mean that members can create any number of any kind of
> subscription?

I don't think we need to support complicated restriction schemes around this
now. I'm sure such needs exist, but I think there's more places where a simple
"allowed/not allowed" suffices.

You'd presumably just grant such a permission to "pseudo superuser"
users. They can typically do a lot of bad things already, so I don't really
see the common need to prevent them from creating many subscriptions.

> If so it may be hard to tighten down later, because we don't know what
> existing setups might break.

Presumably the unlimited number of subs case would still exist as an option
later - so I don't see the problem?

> Perhaps we can just permit a superuser to "ALTER SUBSCRIPTION ... OWNER
> TO ", which makes it simpler to use while still leaving the
> responisbility with the superuser to get it right. Maybe we even block
> the user from altering their own subscription (would be weird but not
> much weirder than what we have now)? I don't know if that solves the
> problem you're trying to solve, but it seems lower-risk.

That seems to not really get us very far. It's hard to use for users, and hard
to make secure for the hosted PG providers.

Greetings,

Andres Freund

Re: Non-superuser subscription owners

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 10:45:35 -0500, Robert Haas wrote:
> On Wed, Jan 18, 2023 at 3:58 PM Mark Dilger
>  wrote:
> > > On Jan 18, 2023, at 12:51 PM, Robert Haas  wrote:
> > >
> > > Unless I'm missing something, it seems like this could be a quite small 
> > > patch.
> >
> > I didn't like the idea of the create/alter subscription commands needing to 
> > parse the connection string and think about what it might do, because at 
> > some point in the future we might extend what things are allowed in that 
> > string, and we have to keep everything that contemplates that string in 
> > sync.  I may have been overly hesitant to tackle that problem.  Or maybe I 
> > just ran short of round tuits.
> 
> I wouldn't be OK with writing our own connection string parser for
> this purpose, but using PQconninfoParse seems OK. We still have to
> embed knowledge of which connection string parameters can trigger
> local file access, but that doesn't seem like a massive problem to me.

> If we already had (or have) that logic someplace else, it would
> probably make sense to reuse it

We hve. See at least postgres_fdw's check_conn_params(), dblink's
dblink_connstr_check() and dblink_security_check().

As part of the fix for 
https://postgr.es/m/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de
I am planning to introduce a bunch of server side helpers for dealing with
libpq (for establishing a connection while accepting interrupts). We could try
to centralize knowledge for those checks there.

The approach of checking, after connection establishment (see
dblink_security_check()), that we did in fact use the specified password,
scares me somewhat. See also below.

> The basic idea that by looking at which connection string properties are set
> we can tell what kinds of things the connection string is going to do seems
> sound to me.

I don't think you *can* check it purely based on existing connection string
properties, unfortunately. Think of e.g. a pg_hba.conf line of "local all user
peer" (quite reasonable config) or "host all all 127.0.0.1/32 trust" (less so).

Hence the hack with dblink_security_check().

I think there might be a discussion somewhere about adding an option to force
libpq to not use certain auth methods, e.g. plaintext password/md5. It's
possible this could be integrated.

Greetings,

Andres Freund

Re: pgindent vs variable declaration across multiple lines

2023-01-19 Thread Tom Lane

Andres Freund  writes:
> There's a few places in the code that try to format a variable definition 
> like this

> ReorderBufferChange *next_change =
> dlist_container(ReorderBufferChange, node, next);

> but pgindent turns that into

> ReorderBufferChange *next_change =
> dlist_container(ReorderBufferChange, node, next);

Yeah, that's bugged me too.  I suspect that the triggering factor is
use of a typedef name within the assigned expression, but I've not
tried to run it to ground.

> I assume we'd again have to dive into pg_bsd_indent's code to fix it :(

Yeah :-(.  That's enough of a rat's nest that I've not really wanted to.
But I'd support applying such a fix if someone can figure it out.

> And even if we were to figure out how, would it be worth the
> reindent-all-branches pain? I'd say yes, but...

What reindent-all-branches pain?  We haven't done an all-branches
reindent in the past, even for pgindent fixes that touched far more
code than this would (assuming that the proposed fix doesn't have
other side-effects).

regards, tom lane

pgindent vs variable declaration across multiple lines

2023-01-19 Thread Andres Freund

Hi,

There's a few places in the code that try to format a variable definition like 
this

ReorderBufferChange *next_change =
dlist_container(ReorderBufferChange, node, next);

but pgindent turns that into

ReorderBufferChange *next_change =
dlist_container(ReorderBufferChange, node, next);

even though the same pattern works, and is used fairly widely for assignments

amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL;

Particularly when variable and/or types names are longer, it's sometimes hard
to fit enough into one line to use a different style. E.g., the code I'm
currently hacking on has

RWConflict  possibleUnsafeConflict = 
dlist_container(RWConflictData, inLink, iter.cur);

There's simply no way to make break that across lines that doesn't either
violate the line length limit or makes pgindent do odd things:

too long line:
RWConflict  possibleUnsafeConflict = dlist_container(RWConflictData,
 inLink,
 iter.cur);

pgindent will move start of second line:
RWConflict  possibleUnsafeConflict =
dlist_container(RWConflictData, inLink, iter.cur);

I know I can leave the variable initially uninitialized and then do a separate
assignment, but that's not a great fix. And sometimes other initializations
want to access the variable alrady.


Do others dislike this as well?

I assume we'd again have to dive into pg_bsd_indent's code to fix it :(

And even if we were to figure out how, would it be worth the
reindent-all-branches pain? I'd say yes, but...

Greetings,

Andres Freund

Re: Experiments with Postgres and SSL

2023-01-19 Thread Jacob Champion

On Wed, Jan 18, 2023 at 7:16 PM Greg Stark  wrote:
> I had a conversation a while back with Heikki where he expressed that
> it was annoying that we negotiate SSL/TLS the way we do since it
> introduces an extra round trip. Aside from the performance
> optimization I think accepting standard TLS connections would open the
> door to a number of other opportunities that would be worth it on
> their own.

Nice! I want this too, but for security reasons [1] -- I want to be
able to turn off negotiated (explicit) TLS, to force (implicit)
TLS-only mode.

> Other things it would open the door to in order from least
> controversial to most
>
> * Hiding Postgres behind a standard SSL proxy terminating SSL without
> implementing the Postgres protocol.

+1

> * "Service Mesh" type tools that hide multiple services behind a
> single host/port ("Service Mesh" is just a new buzzword for "proxy").

If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2].

ALPN doesn't prevent cross-port attacks though, and speaking of those...

> * Browser-based protocol implementations using websockets for things
> like pgadmin or other tools to connect directly to postgres using
> Postgres wire protocol but using native SSL implementations.
>
> * Postgres could even implement an HTTP based version of its protocol
> and enable things like queries or browser based tools using straight
> up HTTP requests so they don't need to use websockets.
>
> * Postgres could implement other protocols to serve up data like
> status queries or monitoring metrics, using HTTP based standard
> protocols instead of using our own protocol.

I see big red warning lights going off in my head -- in a previous
life, I got to fix vulnerabilities that resulted from bolting HTTP
onto existing protocol servers. Not only do you opt into the browser
security model forever, you also gain the ability to speak for any
other web server already running on the same host.

(I know you have PG committers who are also HTTP experts, and I think
you were hacking on mod_perl well before I knew web servers existed.
Just... please be careful. ;D )

> Incidentally I find the logic in ProcessStartupPacket incredibly
> confusing. It took me a while before I realized it's using tail
> recursion to implement the startup logic. I think it would be way more
> straightforward and extensible if it used a much more common iterative
> style. I think it would make it possible to keep more state than just
> ssl_done and gss_done without changing the function signature every
> time for example.

+1. The complexity of the startup logic, both client- and server-side,
is a big reason why I want implicit TLS in the first place. That way,
bugs in that code can't be exploited before the TLS handshake
completes.

Thanks!
--Jacob

[1] 
https://www.postgresql.org/message-id/flat/fcc3ebeb7f05775b63f3207ed52a54ea5d17fb42.camel%40vmware.com
[2] https://alpaca-attack.com/

Re: Non-superuser subscription owners

2023-01-19 Thread Jeff Davis

On Thu, 2023-01-19 at 14:11 -0500, Robert Haas wrote:
> I guess I'm not quite seeing it. Why can't we write a small patch to
> get this working right now, probably in a few hours, and deal with
> any
> improvements that people want at a later time?

To me, it's worrisome when there are more than a few loose ends, and
here it seems like there are more like five. No single issue is a
blocker, but I believe we'd end up with a better user-facing solution
if we solved a couple of these lower-level issues (and think a little
more about the other ones) before we expose new functionality to the
user.

The predefined role is probably the biggest user-facing part of the
change. Does it mean that members can create any number of any kind of
subscription? If so it may be hard to tighten down later, because we
don't know what existing setups might break.

Perhaps we can just permit a superuser to "ALTER SUBSCRIPTION ... OWNER
TO ", which makes it simpler to use while still leaving the
responisbility with the superuser to get it right. Maybe we even block
the user from altering their own subscription (would be weird but not
much weirder than what we have now)? I don't know if that solves the
problem you're trying to solve, but it seems lower-risk.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: PL/Python: Fix return in the middle of PG_TRY() block.

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-16 10:29:03 -0500, Tom Lane wrote:
> Xing Guo  writes:
> > Are there any unsafe codes in pltcl.c? The return statement is in the
> > PG_CATCH() block, I think the exception stack has been recovered in
> > PG_CATCH block so the return statement in PG_CATCH block should be ok?
> 
> Yes, the stack has already been unwound at the start of a PG_CATCH
> (or PG_FINALLY) block, so there is no reason to avoid returning
> out of those.

It's probably true for PG_CATCH, but for PG_FINALLY? Won't returning lead us
to miss rethrowing the error? I guess you can argue that's desired, but then
why would one use PG_FINALLY?

I'm somewhat dubious about allowing to return inside PG_CATCH, even if it's
safe today.

> In principle you could also mess things up with a "continue", "break",
> or "goto" leading out of PG_TRY.  That's probably far less likely
> than "return", but I wonder whether Andres' compiler hack will
> catch that.

I haven't tested it, but it should - it basically traces every path and sees
whether there's any way the "capability" isn't released. To the point that
it's very annoying in other contexts, because it doesn't deal well with
conditional lock acquisition/releases.

Greetings,

Andres Freund

Re: Unicode grapheme clusters

2023-01-19 Thread Tom Lane

Bruce Momjian  writes:
> I am not sure what you are referring to above?  character_length?  I was
> talking about display length, and psql uses that --- at some point, our
> lack of support for graphemes will cause psql to not align columns.

That's going to happen regardless, as long as we can't be sure
what the display will do with the characters --- and that's a
problem that will persist for a very long time.

Ideally, yeah, it'd be great if all this stuff rendered perfectly;
but IMO it's so far outside mainstream usage of psql that it's
not something that could possibly repay the investment of time
to get even a partial solution.

regards, tom lane

Re: Unicode grapheme clusters

2023-01-19 Thread Bruce Momjian

On Thu, Jan 19, 2023 at 07:37:48PM -0500, Greg Stark wrote:
> This is how we've always documented it. Postgres treats code points as
> "characters" not graphemes.
> 
> You don't need to go to anything as esoteric as emojis to see this either.
> Accented characters like é have no canonical forms that are multiple code
> points and in some character sets some accented characters can only be
> represented that way.
> 
> But I don't think there's any reason to consider changing e existing 
> functions.
> They have to be consistent with substr and the other string manipulation
> functions.
> 
> We could add new functions to work with graphemes but it might bring more pain
> keeping it up to date

I am not sure what you are referring to above?  character_length?  I was
talking about display length, and psql uses that --- at some point, our
lack of support for graphemes will cause psql to not align columns.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-01-19 Thread David Rowley

On Tue, 10 Jan 2023 at 15:08, Andres Freund  wrote:
> Thanks for letting me now. Updated version attached.

I'm not too sure I've qualified for giving a meaningful design review
here, but I have started looking at the patches and so far only made
it as far as 0006.

I noted down the following while reading:

v2-0001:

1. BufferCheckOneLocalPin needs a header comment

v2-0002:

2. The following comment and corresponding code to release the
extension lock has been moved now.

/*
* Release the file-extension lock; it's now OK for someone else to extend
* the relation some more.
*/

I think it's worth detailing out why it's fine to release the
extension lock in the new location. You've added detail to the commit
message but I think you need to do the same in the comments too.

v2-0003

3. FileFallocate() and FileZero() should likely document what they
return, i.e zero on success and non-zero on failure.

4. I'm not quite clear on why you've modified FileGetRawDesc() to call
FileAccess() twice.

v2-0004:

5. Is it worth having two versions of PinLocalBuffer() one to adjust
the usage count and one that does not? Couldn't the version that does
not adjust the count skip doing pg_atomic_read_u32()?

v2-0005
v2-0006

David

Re: [PATCH] Teach planner to further optimize sort in distinct

2023-01-19 Thread David Rowley

On Fri, 20 Jan 2023 at 08:26, Ankit Kumar Pandey  wrote:
> > On 19/01/23 18:49, David Rowley wrote:
> > You can just switch to using that function in
> > create_final_distinct_paths(). You'll need to consider if the query is
> > a DISTINCT ON query and not try the unordered version of the function
> > in that case.
>
> Tried this, it worked as expected. Tests are green as well.

Looking at the patch, you've not added any additional tests.  If the
existing tests are all passing then that just tells me that either the
code is not functioning as intended or we have no tests that look at
the EXPLAIN output which can make use of this optimization. If the
former is true, then the patch needs to be fixed. If it's the latter
then you need to write new tests.

> > It's possible that truncate_useless_pathkeys() needs to be modified to
> > check if the pathkeys might be useful for DISTINCT
>
> We have pathkeys_useful_for_merging and pathkeys_useful_for_ordering.
>
> Can we not have pathkeys_useful_for_distinct?

I don't know all the repercussions. If you look at add_path() then
you'll see we do a pathkey comparison when the costs are not fuzzily
different from an existing path so that we try to keep a path with the
best pathkeys.  If we start keeping paths around with other weird
pathkeys that are not along the lines of the query_pathkeys requires,
then add_path might start throwing away paths that are actually good
for something.  It seems probable that could cause some regressions.

> I have attached patch with pathkeys_count_contained_in_unordered
> and corresponding changes in create_final_distinct_paths for reference.

Does this patch actually work?  I tried:

create table ab (a int, b int);
insert into ab select a,b from generate_Series(1,1000)
a,generate_series(1,1000) b;
analyze ab;
create index on ab(a);
set enable_hashagg=0;
explain select distinct b,a from ab where a < 10;
 QUERY PLAN

 Unique  (cost=729.70..789.67 rows=7714 width=8)
   ->  Sort  (cost=729.70..749.69 rows=7996 width=8)
 Sort Key: b, a
 ->  Index Scan using ab_a_idx on ab  (cost=0.42..211.36
rows=7996 width=8)
   Index Cond: (a < 10)
(5 rows)

I'd have expected an incremental sort here. I don't see that you're
adjusting IndexPath's pathkeys anywhere. The nested loop in
pathkeys_count_contained_in_unordered() seems to be inside out. The
reordered_pathkeys must have the common pathkeys in the order they
appear in keys2. In your patch, they'll be ordered according to the
keys1 list, which is wrong. Testing would tell you this, so all the
more reason to make the patch work and write some queries to ensure it
does actually work, then some tests to ensure that remains in a
working state.

Feel free to take the proper time to write a working patch which
contains new tests to ensure it's functioning as intended. It's
disheartening to review patches that don't seem to work. If it wasn't
meant to work, then you didn't make that clear. I'll likely not be
able to do any further reviews until the March commitfest, so it might
be better to only post if you're stuck.  Please don't rush out the
next patch. Take your time and study the code and see if you can build
up your own picture for what the repercussions might be of IndexPaths
having additional pathkeys when they were previously empty.  If you're
uncertain of aspects of the patch you've written feel free to leave
XXX type comments to indicate this. That way the reviewer will know
you might need more guidance on that and you'll not forget yourself
when you come back and look again after a few weeks.

David

Re: Unicode grapheme clusters

2023-01-19 Thread Greg Stark

This is how we've always documented it. Postgres treats code points as
"characters" not graphemes.

You don't need to go to anything as esoteric as emojis to see this either.
Accented characters like é have no canonical forms that are multiple code
points and in some character sets some accented characters can only be
represented that way.

But I don't think there's any reason to consider changing e existing
functions. They have to be consistent with substr and the other string
manipulation functions.

We could add new functions to work with graphemes but it might bring more
pain keeping it up to date

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Peter Geoghegan

On Thu, Jan 19, 2023 at 3:38 PM Andres Freund  wrote:
> Another version of this could be to integrate analyze.c's scan more closely
> with vacuum all the time. It's a bit bonkers that we often sequentially read
> blocks, evict them from shared buffers if we read them, just to then
> afterwards do random IO for blocks we've already read. That's imo what we
> eventually should do, but clearly it's not a small project.

Very often, what we're really missing in VACUUM is high level context.
That's true of what you say here, about analyze.c, as well as
complaints like your vac_estimate_reltuples complaint. The problem
scenarios involving vac_estimate_reltuples all involve repeating the
same silly action again and again, never realizing that that's what's
going on. I've found it very useful to think of one VACUUM as picking
up where the last one left off for my work on freezing.

This seems related to pre-autovacuum historical details. VACUUM
shouldn't really be a command in the same way that CREATE INDEX is a
command. I do think that we need to retain a VACUUM command in some
form, but it should be something pretty close to a command that just
enqueues off-schedule autovacuums. That can do things like coalesce
duplicate requests into one.

Anyway, I am generally in favor of a design that makes VACUUM and
ANALYZE things that are more or less owned by autovacuum. It should be
less and less of a problem to blur the distinction between VACUUM and
ANALYZE under this model, in each successive release. These
distinctions are quite unhelpful, overall, because they make it hard
for autovacuum scheduling to work at the whole-system level.

> > This wouldn't have to happen every time, but it would happen fairly often.
>
> Do you have a mechanism for that in mind? Just something vacuum_count % 10 ==
> 0 like? Or remember scanned_pages in pgstats and re-computing

I was thinking of something very simple like that, yes.

> I think it'd be fine to just use analyze.c and pass in an option to not
> compute column and inheritance stats.

That could be fine. Just as long as it's not duplicative in an obvious way.

> > Presumably you'll want to add the same I/O prefetching logic to this
> > cut-down version, just for example. Since without that there will be
> > no competition between it and ANALYZE proper. Besides which, isn't it
> > kinda wasteful to not just do a full ANALYZE? Sure, you can avoid
> > detoasting overhead that way. But even still.
>
> It's not just that analyze is expensive, I think it'll also be confusing if
> the column stats change after a manual VACUUM without ANALYZE.

Possibly, but it doesn't have to happen there. It's not like the rules
aren't a bit different compared to autovacuum already. For example,
the way TOAST tables are handled by the VACUUM command versus
autovacuum.

Even if it's valuable to maintain this kind of VACUUM/autovacuum
parity (which I tend to doubt), doesn't the same argument work almost
as well with whatever stripped down version you come up with? It's
also confusing that a manual VACUUM command will be doing an
ANALYZE-like thing. Especially in cases where it's really expensive
relative to the work of VACUUM, because VACUUM scanned so few pages.
You just have to make some kind of trade-off.

-- 
Peter Geoghegan

Re: Use appendStringInfoSpaces more

2023-01-19 Thread David Rowley

On Fri, 20 Jan 2023 at 10:23, Peter Smith  wrote:
> Should the add_indent function also have a check to avoid making
> unnecessary calls to appendStringInfoSpaces when the level is 0?

Although I didn't opt to do that, thank you for having a look.

I do think the patch is trivially simple and nobody seems against it,
so I've now pushed it.

David

Re: Experiments with Postgres and SSL

2023-01-19 Thread Greg Stark

On Thu, 19 Jan 2023 at 15:49, Vladimir Sitnikov
 wrote:
>
> What if the server that supports 'fast TLS' added an extra notification in 
> case client connects with a classic TLS?
> Then a capable client could remember host:port and try with newer TLS appoach 
> the next time it connects.
>
> It would be transparent to the clients, and the users won't need to configure 
> 'prefer classic or fast TLS'
> The old clients could discard the notification.

Hm. I hadn't really thought about the case of a new client connecting
to an old server. I don't think it's worth implementing a code path in
the server like this as it would then become cruft that would be hard
to ever get rid of.

I think you can do the same thing, more or less, in the client. Like
if the driver tries to connect via SSL and gets an error it remembers
that host/port and connects using negotiation in the future.

In practice though, by the time drivers support this it'll probably be
far enough in the future that they can just enable it and you can
disable it if you're connecting to an old server. The main benefit for
the near term is going to be clients that are specifically designed to
take advantage of it because it's necessary to enable the environment
they need -- like monitoring tools and proxies.

I've attached the POC. It's not near committable, mainly because of
the lack of any proper interface to the added fields in Port. I
actually had a whole API but ripped it out while debugging because it
wasn't working out.

But here's an example of psql connecting to the same server via
negotiated SSL or through stunnel where stunnel establishes the SSL
connection and psql is just doing plain text:

stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:9432/postgres'
psql (16devel)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384,
compression: off)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
  pid  | ssl | version | cipher | bits | client_dn |
client_serial | issuer_dn
---+-+-++--+---+---+---
 48771 | t   | TLSv1.3 | TLS_AES_256_GCM_SHA384 |  256 |   |
|
(1 row)

postgres=# \q
stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:8999/postgres'
psql (16devel)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
  pid  | ssl | version | cipher | bits | client_dn |
client_serial | issuer_dn
---+-+-++--+---+---+---
 48797 | t   | TLSv1.3 | TLS_AES_256_GCM_SHA384 |  256 |   |
|
(1 row)

-- 
greg
From 4508f872720a0977cf00041a865d76a4d5f77028 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Wed, 18 Jan 2023 15:34:34 -0500
Subject: [PATCH v1] Direct SSL Connections

---
 src/backend/libpq/be-secure.c   |  13 +++
 src/backend/libpq/pqcomm.c  |   9 +-
 src/backend/postmaster/postmaster.c | 140 ++--
 src/include/libpq/libpq-be.h|   3 +
 src/include/libpq/libpq.h   |   2 +-
 5 files changed, 134 insertions(+), 33 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index a0f7084018..39366d04dd 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -235,6 +235,19 @@ secure_raw_read(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;

+	/* XXX feed the raw_buf into SSL */
+	if (port->raw_buf_remaining > 0)
+	{
+		/* consume up to len bytes from the raw_buf */
+		if (len > port->raw_buf_remaining)
+			len = port->raw_buf_remaining;
+		Assert(port->raw_buf);
+		memcpy(ptr, port->raw_buf + port->raw_buf_consumed, len);
+		port->raw_buf_consumed += len;
+		port->raw_buf_remaining -= len;
+		return len;
+	}
+
 	/*
 	 * Try to read from the socket without blocking. If it succeeds we're
 	 * done, otherwise we'll wait for the socket using the latch mechanism.
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 864c9debe8..60fab6a52b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1119,13 +1119,16 @@ pq_discardbytes(size_t len)
 /* 
  *		pq_buffer_has_data		- is any buffered data available to read?
  *
- * This will *not* attempt to read more data.
+ * Actually returns the number of bytes in the buffer...
+ *
+ * This will *not* attempt to read more data. And reading up to that number of
+ * bytes should not cause reading any more data either.
  * 
  */
-bool
+size_t
 pq_buffer_has_data(void)
 {
-	return (PqRecvPointer < PqRecvLength);
+	return (PqRecvLength - PqRecvPointer);
 }

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..b1631e0830 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -412,6 +412,7 @@ static void BackendRun(Por

Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value

2023-01-19 Thread Tom Lane

sirisha chamarthi  writes:
> On Wed, Oct 19, 2022 at 7:59 PM Kyotaro Horiguchi 
> wrote:
>> In short, the proposed fix alone seems fine to me. If we want to show
>> further details, I would add a bit as follows.
>> 
>> | * * WALAVAIL_REMOVED means it has been removed. A replication stream on
>> | *   a slot with this LSN cannot continue.  Note that the affected
>> | *   processes have been terminated by checkpointer, too.

> Thanks for your comments! Attached the patch with your suggestions.

Pushed with a bit of additional wordsmithing.  I thought "have been"
was a bit too strong of an assertion considering that this function
does not pay any attention to the actual state of any processes,
so I made it say "should have been".

regards, tom lane

Re: Use appendStringInfoSpaces more

2023-01-19 Thread David Rowley

On Fri, 20 Jan 2023 at 10:25, Tom Lane  wrote:
>
> Peter Smith  writes:
> > Should the add_indent function also have a check to avoid making
> > unnecessary calls to appendStringInfoSpaces when the level is 0?
>
> Seems like unnecessary extra notation, seeing that appendStringInfoSpaces
> will fall out quickly for a zero argument.

Yeah agreed. As far as I see it, the level will only be 0 before the
first WJB_BEGIN_OBJECT and those appear to be the first thing in the
document, so we'll only indent level 0 once and everything else will
be > 0. So, it also seems to me that the additional check is more
likely to cost more than it would save.

David

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 15:10:38 -0800, Peter Geoghegan wrote:
> On Thu, Jan 19, 2023 at 2:54 PM Andres Freund  wrote:
> > Yea. Hence my musing about potentially addressing this by choosing to visit
> > additional blocks during the heap scan using vacuum's block sampling logic.
> 
> I'd rather just invent a way for vacuumlazy.c to tell the top-level
> vacuum.c caller "I didn't update reltuples, but you ought to go
> ANALYZE the table now that I'm done, even if you weren't already
> planning to do so".

I'm worried about increasing the number of analyzes that much - on a subset of
workloads it's really quite slow.

Another version of this could be to integrate analyze.c's scan more closely
with vacuum all the time. It's a bit bonkers that we often sequentially read
blocks, evict them from shared buffers if we read them, just to then
afterwards do random IO for blocks we've already read. That's imo what we
eventually should do, but clearly it's not a small project.

> This wouldn't have to happen every time, but it would happen fairly often.

Do you have a mechanism for that in mind? Just something vacuum_count % 10 ==
0 like? Or remember scanned_pages in pgstats and re-computing

> > IME most of the time in analyze isn't spent doing IO for the sample blocks
> > themselves, but CPU time and IO for toasted columns. A trimmed down version
> > that just computes relallvisible should be a good bit faster.
> 
> I worry about that from a code maintainability point of view. I'm
> concerned that it won't be very cut down at all, in the end.

I think it'd be fine to just use analyze.c and pass in an option to not
compute column and inheritance stats.

> Presumably you'll want to add the same I/O prefetching logic to this
> cut-down version, just for example. Since without that there will be
> no competition between it and ANALYZE proper. Besides which, isn't it
> kinda wasteful to not just do a full ANALYZE? Sure, you can avoid
> detoasting overhead that way. But even still.

It's not just that analyze is expensive, I think it'll also be confusing if
the column stats change after a manual VACUUM without ANALYZE.

It shouldn't be too hard to figure out whether we're going to do an analyze
anyway and not do the rowcount-estimate version when doing VACUUM ANALYZE or
if autovacuum scheduled an analyze as well.

Greetings,

Andres Freund

Re: Add LZ4 compression in pg_dump

2023-01-19 Thread Tomas Vondra

On 1/19/23 18:55, Tomas Vondra wrote:
> Hi,
> 
> On 1/19/23 17:42, gkokola...@pm.me wrote:
>>
>> ...
>>
>> Agreed. It was initially submitted as one patch. Then it was requested to be
>> split up in two parts, one to expand the use of the existing API and one to
>> replace with the new interface. Unfortunately the expansion of usage of the
>> existing API requires some tweaking, but that is not a very good reason for
>> the current patch set. I should have done a better job there.
>>
>> Please find v22 attach which combines back 0001 and 0002. It is missing the
>> documentation that was discussed above as I wanted to give a quick feedback.
>> Let me know if you think that the combined version is the one to move forward
>> with.
>>
> 
> Thanks, I'll take a look.
> 

After taking a look and thinking about it a bit more, I think we should
keep the two parts separate. I think Michael (or whoever proposed) the
split was right, it makes the patches easier to grok.

Sorry for the noise, hopefully we can just revert to the last version.

While reading the thread, I also noticed this:

> By the way, I think that this 0002 should drop all the default clauses
> in the switches for the compression method so as we'd catch any
> missing code paths with compiler warnings if a new compression method
> is added in the future.

Now I realize why there were "not yet implemented" errors for lz4/zstd
in all the switches, and why after removing them you had to add a
default branch.

We DON'T want a default branch, because the idea is that after adding a
new compression algorithm, we get warnings about switches not handling
it correctly.

So I guess we should walk back this change too :-( It's probably easier
to go back to v20 from January 16, and redo the couple remaining things
I commented on.

FWIW I think this is a hint that adding LZ4/ZSTD options, in 5e73a6048,
but without implementation, was not a great idea. It mostly defeats the
idea of getting the compiler warnings - all the places already handle
PG_COMPRESSION_LZ4/PG_COMPRESSION_ZSTD by throwing a pg_fatal. So you'd
have to grep for the options, inspect all the places or something like
that anyway. The warnings would only work for entirely new methods.

However, I now also realize the compressor API in 0002 replaces all of
this with calls to a generic API callback, so trying to improve this was
pretty silly from me.

Please, fix the couple remaining details in v20, add the docs for the
callbacks, and I'll try to polish it and get it committed.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use fadvise in wal replay

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 22:19:10 +0100, Tomas Vondra wrote:
> So I'm a bit unsure about this patch. I doesn't seem like it can perform
> better than read-ahead (although perhaps it does, on a different storage
> system).

I really don't see the point of the patch as-is. It's not going to help OSs
without useful readahead, because those don't have posix_fadvise either. And
hardcoded readahead distance isn't useful - on e.g. cloud storage 128kB won't
be long enough to overcome network latency.

I could see a *tad* more point in using posix_fadvise(fd, 0, 0,
POSIX_FADV_SEQUENTIAL) when opening WAL files, as that'll increase the kernel
readahead window over what's configured.

> With disabled read-ahead it helps (at least a bit), although I'm not
> really convinced there are good reasons to run without read-ahead. The
> reason for doing that was described like this:

Agreed. Postgres currently totally crashes and burns without OS
readhead. Buffered IO without readahead makes no sense. So I just don't see
the point of this patch.

> > Because database should know better than OS which data needs to be
> > prefetched and which should not. Big OS readahead affects index scan
> > performance.

I don't disagree fundamentally. But that doesn't make this patch a useful
starting point.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Peter Geoghegan

On Thu, Jan 19, 2023 at 2:54 PM Andres Freund  wrote:
> Yea. Hence my musing about potentially addressing this by choosing to visit
> additional blocks during the heap scan using vacuum's block sampling logic.

I'd rather just invent a way for vacuumlazy.c to tell the top-level
vacuum.c caller "I didn't update reltuples, but you ought to go
ANALYZE the table now that I'm done, even if you weren't already
planning to do so". This wouldn't have to happen every time, but it
would happen fairly often.

> IME most of the time in analyze isn't spent doing IO for the sample blocks
> themselves, but CPU time and IO for toasted columns. A trimmed down version
> that just computes relallvisible should be a good bit faster.

I worry about that from a code maintainability point of view. I'm
concerned that it won't be very cut down at all, in the end.

Presumably you'll want to add the same I/O prefetching logic to this
cut-down version, just for example. Since without that there will be
no competition between it and ANALYZE proper. Besides which, isn't it
kinda wasteful to not just do a full ANALYZE? Sure, you can avoid
detoasting overhead that way. But even still.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 13:22:28 -0800, Peter Geoghegan wrote:
> On Thu, Jan 19, 2023 at 12:56 PM Andres Freund  wrote:
> > But in contrast to dead_tuples, where I think we can just stop analyze from
> > updating it unless we crashed recently, I do think we need to update 
> > reltuples
> > in vacuum. So computing an accurate value seems like the least unreasonable
> > thing I can see.
> 
> I agree, but there is no reasonable basis for treating scanned_pages
> as a random sample, especially if it's only a small fraction of all of
> rel_pages -- treating it as a random sample is completely
> unjustifiable.

Agreed.

> And so it seems to me that the only thing that can be done is to either make
> VACUUM behave somewhat like ANALYZE in at least some cases, or to have it
> invoke ANALYZE directly (or indirectly) in those same cases.

Yea. Hence my musing about potentially addressing this by choosing to visit
additional blocks during the heap scan using vacuum's block sampling logic.

IME most of the time in analyze isn't spent doing IO for the sample blocks
themselves, but CPU time and IO for toasted columns. A trimmed down version
that just computes relallvisible should be a good bit faster.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 13:36:41 -0800, Peter Geoghegan wrote:
> On Thu, Jan 19, 2023 at 12:58 PM Andres Freund  wrote:
> > There's absolutely no guarantee that autoanalyze is triggered
> > there. Particularly with repeated vacuums triggered due to an relfrozenxid 
> > age
> > that can't be advanced that very well might not happen within days on a 
> > large
> > relation.
> 
> Arguments like that work far better as arguments in favor of the
> vac_estimate_reltuples heuristics.

I don't agree. But mainly my issue is that the devil you know (how this has
worked for a while) is preferrable to introducing an unknown quantity (your
patch that hasn't yet seen real world exposure).

Greetings,

Andres Freund

Re: DSA failed to allocate memory

2023-01-19 Thread Tom Lane

Thomas Munro  writes:
> Thanks for the report, and for working on the fix.  Can you please
> create a commitfest entry (if you haven't already)?  I plan to look at
> this soon, after the code freeze.

Hi Thomas, are you still intending to look at this DSA bug fix?
It's been sitting idle for months.

regards, tom lane

Re: Unicode grapheme clusters

2023-01-19 Thread Bruce Momjian

On Thu, Jan 19, 2023 at 02:44:57PM +0100, Pavel Stehule wrote:
> Surely it should be fixed. Unfortunately - all the terminals that I can use
> don't support it. So at this moment it may be premature to fix it, because the
> visual form will still be broken.

Yes, none of my terminal emulators handle grapheme clusters either.  In
fact, viewing this email messed up my screen and I had to use control-L
to fix it.

I think one big problem is that our Unicode library doesn't have any way
I know of to query the display device to determine how it
supports/renders Unicode characters, so any display width we report
could be wrong.

Oddly, it seems grapheme clusters were added in Unicode 3.2, which came
out in 2002:

https://www.unicode.org/reports/tr28/tr28-3.html
https://www.quora.com/What-is-graphemeCluster

but somehow I am only seeing studying them now.

Anyway, I added a psql item for this so we don't forget about it:

https://wiki.postgresql.org/wiki/Todo#psql

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF

2023-01-19 Thread Justin Pryzby

On Thu, Jan 19, 2023 at 04:47:59PM -0500, Robert Treat wrote:
> I think all of that feedback is useful, I guess the immediate question
> becomes if Justin wants to try to proceed with his patch implementing
> the change, or if adjusting the documentation for the current
> implementation is the right move for now.

The docs change is desirable in any case, since it should be
backpatched, and any patch to change CREATE..PARTITION OF would be for
v17+ anyway.

-- 
Justin

Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-01-19 Thread Andrew Dunstan



On 2023-01-18 We 10:33, Tom Lane wrote:
> Andrew Dunstan  writes:
>> fairwren and drongo are clean except for fairywren upgrading 9.6 to 11.
>> This appears to be a longstanding issue that the fuzz processing was
>> causing us to ignore. See for example
>> 
> Interesting.  I suspected that removing the fuzz allowance would teach
> us some things we hadn't known about.
>
>> I propose to add this to just the release 11 AdjustUpgrade.pm:
>>     # float4 values in this table on Msys can have precision differences
>>     # in representation between old and new versions
>>     if ($old_version < 10 && $dbnames{contrib_regression_btree_gist} &&
>>     $^O eq 'msys')
>>     {
>>     _add_st($result, 'contrib_regression_btree_gist',
>>             'drop table if exists float4tmp');
>>     }
> Seems reasonable (but I wonder if you don't need "$old_version < 11").
> A nicer answer would be to apply --extra-float-digits=0 across the
> board, but pre-v12 pg_dump lacks that switch.
>
>   


It turns out this was due to the fact that fairywren's setup changed
some time after the EOL of 9.6. I have rebuilt 9.6 and earlier
backbranches and there should now be no need for this adjustment.

There is still a Windows issue with MSVC builds <= 9.4 that I'm trying
to track down.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF

2023-01-19 Thread Robert Treat

On Wed, Jan 11, 2023 at 4:13 PM Robert Haas  wrote:
> On Wed, Jan 11, 2023 at 10:48 AM Robert Treat  wrote:
> > > @Robert: I wonder why shouldn't CREATE..PARTITION OF *also* be patched
> > > to first create a table, and then attach the partition, transparently
> > > doing what everyone would want, without having to re-read the updated
> > > docs or know to issue two commands?  I wrote a patch for this which
> > > "doesn't fail tests", but I still wonder if I'm missing something..
> > >
> >
> > I was thinking there might be either lock escalation issues or perhaps
> > issues around index attachment that don't surface using create
> > partition of, but I don't actually see any, in which case that does
> > seem like a better change all around. But like you, I feel I must be
> > overlooking something :-)
>
> To be honest, I'm not sure whether either of you are missing anything
> or not. I think a major reason why I didn't implement this was that
> it's a different code path. DefineRelation() has code to do a bunch of
> things that are also done by ATExecAttachPartition(), and I haven't
> gone through exhaustively and checked whether there are any relevant
> differences. I think that part of the reason that I did not research
> that at the time is that the patch was incredibly complicated to get
> working at all and I didn't want to take any risk of adding things to
> it that might create more problems. Now that it's been a few years, we
> might feel more confident.
>
> Another thing that probably deserves at least a bit of thought is the
> fact that ATTACH PARTITION just attaches a partition, whereas CREATE
> TABLE does a lot more things. Are any of those things potential
> hazards? Like what if the newly-created table references the parent
> via a foreign key, or uses the parent's row type as a column type or
> as part of a column default expression or in a CHECK constraint or
> something? Basically, try to think of weird scenarios where the new
> table would interact with the parent in some weird way where the
> weaker lock would be a problem. Maybe there's nothing to see here: not
> sure.
>
> Also, we need to separately analyze the cases where (1) the new
> partition is the default partition, (2) the new partition is not the
> default partition but a default partition exists, and (3) the new
> partition is not the default partition and no default partition
> exists.
>
> Sorry not to have more definite thoughts here. I know that when I
> developed the original patch, I thought about this case and decided my
> brain was full. However, I do not recall whether I knew about any
> specific problems that needed to be fixed, or just feared that there
> might be some.
>

I think all of that feedback is useful, I guess the immediate question
becomes if Justin wants to try to proceed with his patch implementing
the change, or if adjusting the documentation for the current
implementation is the right move for now.


Robert Treat
https://xzilla.net

Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-01-19 Thread Tom Lane

Andrew Dunstan  writes:
> See
> 
> I tested it and it seems to be doing the right thing.

Yeah, seems to do what I want.  Thanks!

regards, tom lane

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Peter Geoghegan

On Thu, Jan 19, 2023 at 12:58 PM Andres Freund  wrote:
> There's absolutely no guarantee that autoanalyze is triggered
> there. Particularly with repeated vacuums triggered due to an relfrozenxid age
> that can't be advanced that very well might not happen within days on a large
> relation.

Arguments like that work far better as arguments in favor of the
vac_estimate_reltuples heuristics.

That doesn't mean that the heuristics are good in any absolute sense,
of course. They were just a band aid intended to ameliorate some of
the negative impact that came from treating scanned_pages as a random
sample. I think that we both agree that the real problem is that
scanned_pages just isn't a random sample, at least not as far as
reltuples/live tuples is concerned (for dead tuples it kinda isn't a
sample, but is rather something close to an exact count).

I now understand that you're in favor of addressing the root problem
directly. I am also in favor of that approach. I'd be more than happy
to get rid of the band aid as part of that whole effort.

-- 
Peter Geoghegan

Re: Operation log for major operations

2023-01-19 Thread Ted Yu

On Thu, Jan 19, 2023 at 1:12 PM Dmitry Koval  wrote:

>  >The patch does not apply on top of HEAD ...
>
> Thanks!
> Here is a fixed version.
>
> Additional changes:
> 1) get_operation_log() function doesn't create empty operation log file;
> 2) removed extra unlink() call.
>
> --
> With best regards,
> Dmitry Koval
>
> Postgres Professional: http://postgrespro.com

Hi,

Copyright (c) 1996-2022

Please update year for the license in pg_controllog.c

+check_file_exists(const char *datadir, const char *path)

There is existing helper function such as:

src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char *name);

Is it possible to reuse that code ?

Cheers

Re: Use appendStringInfoSpaces more

2023-01-19 Thread Tom Lane

Peter Smith  writes:
> Should the add_indent function also have a check to avoid making
> unnecessary calls to appendStringInfoSpaces when the level is 0?

Seems like unnecessary extra notation, seeing that appendStringInfoSpaces
will fall out quickly for a zero argument.

regards, tom lane

Re: Use appendStringInfoSpaces more

2023-01-19 Thread Peter Smith

On Thu, Jan 19, 2023 at 8:45 PM David Rowley  wrote:
>
> In [1] I noticed a bit of a poor usage of appendStringInfoString which
> just appends 4 spaces in a loop, one for each indent level of the
> jsonb.  It should be better just to use appendStringInfoSpaces and
> just append all the spaces in one go rather than appending 4 spaces in
> a loop. That'll save having to check enlargeStringInfo() once for each
> loop.
>

Should the add_indent function also have a check to avoid making
unnecessary calls to appendStringInfoSpaces when the level is 0?

e.g.
if (indent)
{
appendStringInfoCharMacro(out, '\n');
if (level > 0)
appendStringInfoSpaces(out, level * 4);
 }

V.

if (indent)
{
appendStringInfoCharMacro(out, '\n');
appendStringInfoSpaces(out, level * 4);
 }

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Peter Geoghegan

On Thu, Jan 19, 2023 at 12:56 PM Andres Freund  wrote:
> > In other words, ANALYZE sometimes (but not always) produces wrong answers.
>
> For dead tuples, but not live tuples.

> > In other words, VACUUM sometimes (but not always) produces wrong answers.
>
> For live tuples, but not badly so for dead tuples.

Agreed. More generally, there is a need to think about the whole table
in some cases (like for regular optimizer statistics including
reltuples/live tuples), and the subset of pages that will be scanned
by VACUUM in other cases (for dead tuples accounting). Random sampling
at the table level is appropriate and works well enough for the
former, though not for the latter.

> We are, but perhaps not too badly so, because we can choose to believe analyze
> more for live tuples, and vacuum for dead tuples. Analyze doesn't compute
> reltuples incrementally and vacuum doesn't compute deadtuples incrementally.

Good points.

> But in contrast to dead_tuples, where I think we can just stop analyze from
> updating it unless we crashed recently, I do think we need to update reltuples
> in vacuum. So computing an accurate value seems like the least unreasonable
> thing I can see.

I agree, but there is no reasonable basis for treating scanned_pages
as a random sample, especially if it's only a small fraction of all of
rel_pages -- treating it as a random sample is completely
unjustifiable. And so it seems to me that the only thing that can be
done is to either make VACUUM behave somewhat like ANALYZE in at least
some cases, or to have it invoke ANALYZE directly (or indirectly) in
those same cases.

-- 
Peter Geoghegan

Re: Use fadvise in wal replay

2023-01-19 Thread Tomas Vondra

Hi,

I looked at this patch today. The change is fairly simple, so I decided
to do a benchmark. To prepare, I created a cluster with a 1GB database,
created a backup, and ran 1h UPDATE workload with WAL archiving. Then,
the actual benchmark does this:

1. restore the datadir backup
2. copy the WAL from archive
3. drop caches
4. start the cluster, measure time until end of recovery

I did this with master/patched, and with/without Linux readahead. And I
did this on two different machines - both have SSD storage, one (i5) has
a RAID of SATA SSD devices, the other one (xeon) has a single NVMe SSD.

The results (in seconds) look like this (the amount of WAL is different
on each machine, so the timings are not comparable).

 host ra  master  patched
--
   i5  0 615  513
 256 392  396
--
 xeon  02113 1436
 2561487 1460

On i5 (smaller machine with RAID of 6 x SATA SSD), with read-ahead
enabled it takes ~390 seconds, and the patch makes no difference.
Without read-ahead, it takes ~615 seconds, and the patch does help a
bit, but it's hardly competitive to the read-ahead.

Note: 256 is read-ahead per physical device, the read-ahead value for
the whole RAID is 6x that, i.e. 1536. I was speculating that maybe the
hard-coded 128kB RACHUNK is not sufficient, so I tried with 512kB. But
that actually made it worse, and the timing deteriorated to ~640s (that
is, slower than master without read-ahead).

On the xeon (with NVMe SSD), it's different - the patch seems about as
effective as regular read-ahead. So that's good.


So I'm a bit unsure about this patch. I doesn't seem like it can perform
better than read-ahead (although perhaps it does, on a different storage
system).

With disabled read-ahead it helps (at least a bit), although I'm not
really convinced there are good reasons to run without read-ahead. The
reason for doing that was described like this:

> Because database should know better than OS which data needs to be
> prefetched and which should not. Big OS readahead affects index scan
> performance.

I don't recall seeing such issue, and I can't find anything like that in
our mailinglist archives either. Sure, that doesn't mean it can't
happen, and read-ahead is a heuristics so it can do weird stuff. But in
my experience it tends to work fairly well. The issues I've seen are
generally in the opposite direction, i.e. read-ahead not kicking in.


Anyway, it's not my intent to prevent this patch getting committed, if
someone wishes to do that. But I'm not quite convinced it actually helps
with a practical issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-19 Thread Israel Barth Rubio

Hello Jim,

> Hi Jelte, thanks for the message. You're right, an invalid cert path
> does solve the issue - I even use it for tests. Although it solves the
> authentication issue it still looks in my eyes like a non intuitive
> workaround/hack. Perhaps a new sslmode isn't the right place for this
> "feature"? Thanks again for the suggestion!

I do not think it is worth it to change the current behavior of PostgreSQL
in that sense.

PostgreSQL looks for the cert and key under `~/.postgresql` as a facility.
These files do not exist by default, so if PostgreSQL finds something in
there it assumes you want to use it.

I also think it is correct in the sense of choosing the certificate over
a password based authentication when it finds a certificate as the cert
based would provide you with stronger checks.

I believe that using libpq services would be a better approach if you
want to connect to several PostgreSQL clusters from the very same
source machine. That way you would specify whatever is specific to each
target cluster in a centralized configuration file and just reference each
target cluster by its service name in the connection string. It would
require that you move the SSL cert and key from `~/.postgresql` to somewhere
else and specify `sslcert` and `sslkey` in the expected service in the
`~/.pg_service.conf` file.

More info about that can be found at:

https://www.postgresql.org/docs/current/libpq-pgservice.html

Best regards,
Israel.

>

Re: Operation log for major operations

2023-01-19 Thread Dmitry Koval


>The patch does not apply on top of HEAD ...

Thanks!
Here is a fixed version.

Additional changes:
1) get_operation_log() function doesn't create empty operation log file;
2) removed extra unlink() call.

--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom d713a32499802395639645412a7c605870280f3a Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Mon, 14 Nov 2022 21:39:14 +0300
Subject: [PATCH v4] Operation log

---
 src/backend/access/transam/xlog.c |  10 +
 src/backend/backup/basebackup.c   |   1 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/meson.build|   1 +
 src/backend/utils/misc/pg_controllog.c| 142 
 src/bin/pg_checksums/pg_checksums.c   |   1 +
 src/bin/pg_resetwal/pg_resetwal.c |   4 +
 src/bin/pg_rewind/file_ops.c  |  28 +
 src/bin/pg_rewind/file_ops.h  |   2 +
 src/bin/pg_rewind/pg_rewind.c |  59 ++
 src/bin/pg_upgrade/controldata.c  |  52 ++
 src/bin/pg_upgrade/exec.c |   9 +-
 src/bin/pg_upgrade/pg_upgrade.c   |   2 +
 src/bin/pg_upgrade/pg_upgrade.h   |   2 +
 src/common/Makefile   |   1 +
 src/common/controllog_utils.c | 667 ++
 src/common/meson.build|   1 +
 src/include/catalog/pg_controllog.h   | 142 
 src/include/catalog/pg_proc.dat   |   9 +
 src/include/common/controllog_utils.h |  27 +
 src/test/modules/test_misc/meson.build|   1 +
 .../modules/test_misc/t/004_operation_log.pl  | 136 
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 24 files changed, 1298 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/misc/pg_controllog.c
 create mode 100644 src/common/controllog_utils.c
 create mode 100644 src/include/catalog/pg_controllog.h
 create mode 100644 src/include/common/controllog_utils.h
 create mode 100644 src/test/modules/test_misc/t/004_operation_log.pl

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 8f47fb7570..dd3c4c7ac4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -68,6 +68,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
+#include "common/controllog_utils.h"
 #include "common/file_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -4775,6 +4776,9 @@ BootStrapXLOG(void)
/* some additional ControlFile fields are set in WriteControlFile() */
WriteControlFile();
 
+   /* Put information into operation log. */
+   put_operation_log_element(DataDir, OLT_BOOTSTRAP);
+
/* Bootstrap the commit log, too */
BootStrapCLOG();
BootStrapCommitTs();
@@ -5743,8 +5747,14 @@ StartupXLOG(void)
SpinLockRelease(&XLogCtl->info_lck);
 
UpdateControlFile();
+
LWLockRelease(ControlFileLock);
 
+   /* Put information into operation log. */
+   if (promoted)
+   put_operation_log_element(DataDir, OLT_PROMOTED);
+   put_operation_log_element(DataDir, OLT_STARTUP);
+
/*
 * Shutdown the recovery environment.  This must occur after
 * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 3fb9451643..0ca709b5b2 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -211,6 +211,7 @@ static const struct exclude_list_item excludeFiles[] =
  */
 static const struct exclude_list_item noChecksumFiles[] = {
{"pg_control", false},
+   {"pg_control_log", false},
{"pg_filenode.map", false},
{"pg_internal.init", true},
{"PG_VERSION", false},
diff --git a/src/backend/storage/lmgr/lwlocknames.txt 
b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..5673de1669 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock   46
 NotifyQueueTailLock47
+ControlLogFileLock 48
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index b9ee4eb48a..3fa20e0368 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -23,6 +23,7 @@ OBJS = \
help_config.o \
pg_config.o \
pg_controldata.o \
+   pg_controllog.o \
pg_rusage.o \
ps_status.o \
queryenvironment.o \
diff --git a/src/backend/utils/misc/meson.build 
b/src/backend/utils/misc/meson.build
in

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Andres Freund

Hi

On 2023-01-18 19:26:22 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 7:04 PM Andres Freund  wrote:
> > > You seem to be saying that it's a problem if we don't update reltuples
> > > -- an estimate -- when less than 2% of the table is scanned by VACUUM.
> > > But why? Why can't we just do nothing sometimes? I mean in general,
> > > leaving aside the heuristics I came up with for a moment?
> >
> > The problem isn't that we might apply the heuristic once, that'd be fine. 
> > But
> > that there's nothing preventing it from applying until there basically are 
> > no
> > tuples left, as long as the vacuum is frequent enough.
> >
> > As a demo: The attached sql script ends up with a table containing 10k rows,
> > but relpages being set 1 million.
> 
> I saw that too. But then I checked again a few seconds later, and
> autoanalyze had run, so reltuples was 10k. Just like it would have if
> there was no VACUUM statements in your script.

There's absolutely no guarantee that autoanalyze is triggered
there. Particularly with repeated vacuums triggered due to an relfrozenxid age
that can't be advanced that very well might not happen within days on a large
relation.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 15:12:12 -0500, Robert Haas wrote:
> On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan  wrote:
> > pgstat_report_analyze() will totally override the
> > tabentry->dead_tuples information that drives autovacuum.c, based on
> > an estimate derived from a random sample -- which seems to me to be an
> > approach that just doesn't have any sound theoretical basis.
> 
> In other words, ANALYZE sometimes (but not always) produces wrong answers.

For dead tuples, but not live tuples.

> On Wed, Jan 18, 2023 at 4:08 PM Andres Freund  wrote:
> > One complicating factor is that VACUUM sometimes computes an incrementally
> > more bogus n_live_tup when it skips pages due to the VM, whereas ANALYZE
> > computes something sane. I unintentionally encountered one when I was trying
> > something while writing this email, reproducer attached.
> 
> In other words, VACUUM sometimes (but not always) produces wrong answers.

For live tuples, but not badly so for dead tuples.

> TL;DR: We're screwed.

We are, but perhaps not too badly so, because we can choose to believe analyze
more for live tuples, and vacuum for dead tuples. Analyze doesn't compute
reltuples incrementally and vacuum doesn't compute deadtuples incrementally.

> I refuse to believe that any amount of math you can do on numbers that
> can be arbitrarily inaccurate will result in an accurate answer
> popping out the other end. Trying to update the reltuples estimate
> incrementally based on an estimate derived from a non-random,
> likely-to-be-skewed subset of the table is always going to produce
> distortion that gets worse and worse the more times you do it. If
> could say, well, the existing estimate of let's say 100 tuples per
> page is based on the density being 200 tuples per page in the pages I
> just scanned and 50 tuples per page in the rest of the table, then you
> could calculate a new estimate that keeps the value of 50 tuples per
> page for the remainder of the table intact and just replaces the
> estimate for the part you just scanned. But we have no way of doing
> that, so we just make some linear combination of the old estimate with
> the new one. That overweights the repeatedly-sampled portions of the
> table more and more, making the estimate wronger and wronger.

Perhaps we should, at least occasionally, make vacuum do a cheaper version of
analyze's sampling to compute an updated reltuples. This could even happen
during the heap scan phase.

I don't like relying on analyze to fix vacuum's bogus reltuples, because
there's nothing forcing an analyze run soon after vacuum [incrementally]
screwed it up. Vacuum can be forced to run a lot of times due to xid horizons
preventing cleanup, after which there isn't anything forcing analyze to run
again.

But in contrast to dead_tuples, where I think we can just stop analyze from
updating it unless we crashed recently, I do think we need to update reltuples
in vacuum. So computing an accurate value seems like the least unreasonable
thing I can see.

Greetings,

Andres Freund

Re: document the need to analyze partitioned tables

2023-01-19 Thread Bruce Momjian

On Thu, Jan 19, 2023 at 01:50:05PM +0100, Laurenz Albe wrote:
> On Wed, 2023-01-18 at 16:23 -0500, Bruce Momjian wrote:
> > Is it possible to document when partition table statistics helps?
> 
> I think it would be difficult to come up with an exhaustive list.

I was afraid of that.  I asked only because most people assume
autovacuum handles _all_ statistics needs, but this case is not handled.
Do people even have any statistics maintenance process anymore, and if
not, how would they know they need to run a manual ANALYZE?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Experiments with Postgres and SSL

2023-01-19 Thread Vladimir Sitnikov

It would be great if PostgreSQL supported 'start with TLS', however, how
could clients activate the feature?

I would like to refrain users from configuring the handshake mode, and I
would like to refrain from degrading performance when a new client talks to
an old database.

What if the server that supports 'fast TLS' added an extra notification in
case client connects with a classic TLS?
Then a capable client could remember host:port and try with newer TLS
appoach the next time it connects.

It would be transparent to the clients, and the users won't need to
configure 'prefer classic or fast TLS'
The old clients could discard the notification.

Vladimir

-- 
Vladimir

Re: Transparent column encryption

2023-01-19 Thread Jacob Champion

On 12/31/22 06:17, Peter Eisentraut wrote:
> On 21.12.22 06:46, Peter Eisentraut wrote:
>> And another update.  The main changes are that I added an 'unspecified' 
>> CMK algorithm, which indicates that the external KMS knows what it is 
>> but the database system doesn't.  This was discussed a while ago.  I 
>> also changed some details about how the "cmklookup" works in libpq. Also 
>> added more code comments and documentation and rearranged some code.

Trying to delay a review until I had "completed it" has only led to me
not reviewing, so here's a partial one. Let me know what pieces of the
implementation and/or architecture you're hoping to get more feedback on.

I like the existing "caveats" documentation, and I've attached a sample
patch with some more caveats documented, based on some of the upthread
conversation:

- text format makes fixed-length columns leak length information too
- you only get partial protection against the Evil DBA
- RSA-OAEP public key safety

(Feel free to use/remix/discard as desired.)

When writing the paragraph on RSA-OAEP I was reminded that we didn't
really dig into the asymmetric/symmetric discussion. Assuming that most
first-time users will pick the builtin CMK encryption method, do we
still want to have an asymmetric scheme implemented first instead of a
symmetric keywrap? I'm still concerned about that public key, since it
can't really be made public. (And now that "unspecified" is available, I
think an asymmetric CMK could be easily created by users that have a
niche use case, and then we wouldn't have to commit to supporting it
forever.)

For the padding caveat:

> +  There is no concern if all values are of the same length (e.g., credit
> +  card numbers).

I nodded along to this statement last year, and then this year I learned
that CCNs aren't fixed-length. So with a 16-byte block, you're probably
going to be able to figure out who has an American Express card.

The column encryption algorithm is set per-column -- but isn't it
tightly coupled to the CEK, since the key length has to match? From a
layperson perspective, using the same key to encrypt the same plaintext
under two different algorithms (if they happen to have the same key
length) seems like it might be cryptographically risky. Is there a
reason I should be encouraged to do that?

With the loss of \gencr it looks like we also lost a potential way to
force encryption from within psql. Any plans to add that for v1?

While testing, I forgot how the new option worked and connected with
`column_encryption=on` -- and then I accidentally sent unencrypted data
to the server, since `on` means "not enabled". :( The server errors out
after the damage is done, of course, but would it be okay to strictly
validate that option's values?

Are there plans to document client-side implementation requirements, to
ensure cross-client compatibility? Things like the "PG\x00\x01"
associated data are buried at the moment (or else I've missed them in
the docs). If you're holding off until the feature is more finalized,
that's fine too.

Speaking of cross-client compatibility, I'm still disconcerted by the
ability to write the value "hello world" into an encrypted integer
column. Should clients be required to validate the text format, using
the attrealtypid?

It occurred to me when looking at the "unspecified" CMK scheme that the
CEK doesn't really have to be an encryption key at all. In that case it
can function more like a (possibly signed?) cookie for lookup, or even
be ignored altogether if you don't want to use a wrapping scheme
(similar to JWE's "direct" mode, maybe?). So now you have three ways to
look up or determine a column encryption key (CMK realm, CMK name, CEK
cookie)... is that a concept worth exploring in v1 and/or the documentation?

Thanks,
--Jacobdiff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 55f33a2f5f..06e1c077d5 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1588,7 +1588,33 @@ export PGCMKLOOKUP
 card numbers).  But if there are signficant length differences between
 valid values and that length information is security-sensitive, then
 application-specific workarounds such as padding would need to be applied.
-How to do that securely is beyond the scope of this manual.
+How to do that securely is beyond the scope of this manual.  Note that
+column encryption is applied to the text representation of the stored 
value,
+so length differences can be leaked even for fixed-length column types 
(e.g.
+bigint, whose largest decimal representation is longer
+than 16 bytes).
+   
+
+   
+Column encryption provides only partial protection against a malicious
+user with write access to the table.  Once encrypted, any modifications to 
a
+stored value on the server side will cause a decryption failure on the
+client.  However, a user with write access may still freely swap encrypted
+values between rows or columns (o

Re: meson oddities

2023-01-19 Thread Andres Freund

Hi,

On 2023-01-19 21:37:15 +0100, Peter Eisentraut wrote:
> On 11.01.23 12:05, Peter Eisentraut wrote:
> > I think there is also an adjacent issue:  The subdir options may be
> > absolute or relative.  So if you specify --prefix=/usr/local and
> > --sysconfdir=/etc/postgresql, then
> > 
> >      config_paths_data.set_quoted('SYSCONFDIR', dir_prefix / dir_sysconf)
> > 
> > would produce something like /usr/local/etc/postgresql.

I don't think it would. The / operator understands absolute paths and doesn't
add the "first component" if the second component is absolute.

>  
>  dir_bin = get_option('bindir')
> +if not fs.is_absolute(dir_bin)
> +  dir_bin = dir_prefix / dir_bin
> +endif

Hm, I'm not sure this works entirely right on windows. A path like /blub isn't
absolute on windows, but it's not really relative either. It's a "drive local"
path. I.e. relative to the current drive (c:/), but not the subdirectory
therein.

Greetings,

Andres Freund

Re: meson oddities

2023-01-19 Thread Peter Eisentraut


On 11.01.23 12:05, Peter Eisentraut wrote:
I think there is also an adjacent issue:  The subdir options may be 
absolute or relative.  So if you specify --prefix=/usr/local and 
--sysconfdir=/etc/postgresql, then


     config_paths_data.set_quoted('SYSCONFDIR', dir_prefix / dir_sysconf)

would produce something like /usr/local/etc/postgresql.

I think maybe we should make all the dir_* variables absolute right at 
the beginning, like


     dir_lib = get_option('libdir')
     if not fs.is_absolute(dir_lib)
     dir_lib = dir_prefix / dir_lib
     endif

And then the appending stuff could be done after that, keeping the 
current code.


Here is a proposed patch.  This should fix all these issues.
From cad243b60e0a45514f48dfc189069eaf883fc5a6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 19 Jan 2023 21:05:01 +0100
Subject: [PATCH] meson: Make all subdirectory variables absolute

Meson subdirectory options may be returned as absolute or relative,
depending on whether they are under prefix.  Meson itself can handle
both, but we might need an absolute path in some cases, so to simplify
this turn all paths to absolute right at the beginning.
---
 meson.build | 45 +
 src/include/meson.build | 24 +++---
 2 files changed, 49 insertions(+), 20 deletions(-)

diff --git a/meson.build b/meson.build
index 45fb9dd616..edbec530e2 100644
--- a/meson.build
+++ b/meson.build
@@ -457,27 +457,46 @@ endif
 # Directories
 ###
 
-# These are set by the equivalent --xxxdir configure options.  We
-# append "postgresql" to some of them, if the string does not already
-# contain "pgsql" or "postgres", in order to avoid directory clutter.
+# These are set by the equivalent --xxxdir configure options.
+#
+# Subdirectory options may be returned as absolute or relative,
+# depending on whether they are under prefix.  Meson itself can handle
+# both, but we might need an absolute path in some cases, so to
+# simplify this we turn all paths to absolute here.
+#
+# We append "postgresql" to some of the directories, if the string
+# does not already contain "pgsql" or "postgres", in order to avoid
+# directory clutter.
 
 pkg = 'postgresql'
 
 dir_prefix = get_option('prefix')
 
 dir_bin = get_option('bindir')
+if not fs.is_absolute(dir_bin)
+  dir_bin = dir_prefix / dir_bin
+endif
 
 dir_data = get_option('datadir')
+if not fs.is_absolute(dir_data)
+  dir_data = dir_prefix / dir_data
+endif
 if not (dir_data.contains('pgsql') or dir_data.contains('postgres'))
   dir_data = dir_data / pkg
 endif
 
 dir_sysconf = get_option('sysconfdir')
+if not fs.is_absolute(dir_sysconf)
+  dir_sysconf = dir_prefix / dir_sysconf
+endif
 if not (dir_sysconf.contains('pgsql') or dir_sysconf.contains('postgres'))
   dir_sysconf = dir_sysconf / pkg
 endif
 
 dir_lib = get_option('libdir')
+if not fs.is_absolute(dir_lib)
+  dir_lib = dir_prefix / dir_lib
+endif
 
 dir_lib_pkg = dir_lib
 if not (dir_lib_pkg.contains('pgsql') or dir_lib_pkg.contains('postgres'))
@@ -487,21 +506,31 @@ endif
 dir_pgxs = dir_lib_pkg / 'pgxs'
 
 dir_include = get_option('includedir')
+if not fs.is_absolute(dir_include)
+  dir_include = dir_prefix / dir_include
+endif
 
 dir_include_pkg = dir_include
-dir_include_pkg_rel = ''
 if not (dir_include_pkg.contains('pgsql') or 
dir_include_pkg.contains('postgres'))
   dir_include_pkg = dir_include_pkg / pkg
-  dir_include_pkg_rel = pkg
 endif
 
 dir_man = get_option('mandir')
+if not fs.is_absolute(dir_man)
+  dir_man = dir_prefix / dir_man
+endif
 
 # FIXME: These used to be separately configurable - worth adding?
 dir_doc = get_option('datadir') / 'doc' / 'postgresql'
+if not fs.is_absolute(dir_doc)
+  dir_doc = dir_prefix / dir_doc
+endif
 dir_doc_html = dir_doc
 
 dir_locale = get_option('localedir')
+if not fs.is_absolute(dir_locale)
+  dir_locale = dir_prefix / dir_locale
+endif
 
 
 # Derived values
@@ -2560,9 +2589,9 @@ mod_install_rpaths = []
 if host_system != 'darwin'
   # Add absolute path to libdir to rpath. This ensures installed binaries /
   # libraries find our libraries (mainly libpq).
-  bin_install_rpaths += dir_prefix / dir_lib
-  lib_install_rpaths += dir_prefix / dir_lib
-  mod_install_rpaths += dir_prefix / dir_lib
+  bin_install_rpaths += dir_lib
+  lib_install_rpaths += dir_lib
+  mod_install_rpaths += dir_lib
 
   # Add extra_lib_dirs to rpath. This ensures we find libraries we depend on.
   #
diff --git a/src/include/meson.build b/src/include/meson.build
index 51ad06cecb..b70ffdf785 100644
--- a/src/include/meson.build
+++ b/src/include/meson.build
@@ -28,18 +28,18 @@ configure_files += pg_config
 
 
 config_paths_data = configuration_data()
-config_paths_data.set_quoted('PGBINDIR', dir_prefix / dir_bin)
-config_paths_data.set_quoted('PGSHAREDIR', dir_prefix / dir_data)
-config_paths_data.set_quoted('SYSCONFDIR', dir_prefix / dir_sysconf)
-config_paths_data.set_quoted(

Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-19 Thread Tom Lane

Thomas Munro  writes:
> So I think we probably need something like the attached, which I was
> originally trying to avoid.

Yeah, something like that.  I also wonder if you don't need to think
a bit harder about the ordering of the flag checks, in particular
it seems like servicing reload_request before child_exit might be
a good idea (remembering that child_exit might cause launching of
new children, so we want to be up to speed on relevant settings).

regards, tom lane

Re: almost-super-user problems that we haven't fixed yet

2023-01-19 Thread Robert Haas

On Thu, Jan 19, 2023 at 2:46 PM Nathan Bossart  wrote:
> > Thanks. I'd move it to the inner indentation level so it's closer to
> > the test at issue.
>
> I meant for it to cover the call to HaveNFreeProcs() as well since the same
> idea applies.  I left it the same for now, but if you still think it makes
> sense to move it, I'll do so.

Hmm, OK. If you want to leave it where it is, I won't argue further.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-19 Thread Robert Haas

On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan  wrote:
> pgstat_report_analyze() will totally override the
> tabentry->dead_tuples information that drives autovacuum.c, based on
> an estimate derived from a random sample -- which seems to me to be an
> approach that just doesn't have any sound theoretical basis.

In other words, ANALYZE sometimes (but not always) produces wrong answers.

On Wed, Jan 18, 2023 at 4:08 PM Andres Freund  wrote:
> One complicating factor is that VACUUM sometimes computes an incrementally
> more bogus n_live_tup when it skips pages due to the VM, whereas ANALYZE
> computes something sane. I unintentionally encountered one when I was trying
> something while writing this email, reproducer attached.

In other words, VACUUM sometimes (but not always) produces wrong answers.

TL;DR: We're screwed.

I refuse to believe that any amount of math you can do on numbers that
can be arbitrarily inaccurate will result in an accurate answer
popping out the other end. Trying to update the reltuples estimate
incrementally based on an estimate derived from a non-random,
likely-to-be-skewed subset of the table is always going to produce
distortion that gets worse and worse the more times you do it. If
could say, well, the existing estimate of let's say 100 tuples per
page is based on the density being 200 tuples per page in the pages I
just scanned and 50 tuples per page in the rest of the table, then you
could calculate a new estimate that keeps the value of 50 tuples per
page for the remainder of the table intact and just replaces the
estimate for the part you just scanned. But we have no way of doing
that, so we just make some linear combination of the old estimate with
the new one. That overweights the repeatedly-sampled portions of the
table more and more, making the estimate wronger and wronger.

Now, that is already quite bad. But if we accept the premise that
neither VACUUM nor ANALYZE is guaranteed to ever produce a new
actually-reliable estimate, then not only will we go progressively
more wrong as time goes by, but we have no way of ever fixing
anything. If you get a series of unreliable data points followed by a
reliable data point, you can at least get back on track when the
reliable data shows up. But it sounds like you guys are saying that
there's no guarantee that will ever happen, which is a bit like
discovering that not only do you have a hole in your gas tank but
there is no guarantee that you will arrive at a gas station ever again
regardless of distance travelled.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: almost-super-user problems that we haven't fixed yet

2023-01-19 Thread Nathan Bossart

On Thu, Jan 19, 2023 at 02:17:35PM -0500, Robert Haas wrote:
> On Thu, Jan 19, 2023 at 12:54 PM Nathan Bossart
>  wrote:
>> > OK. Might be worth a short comment.
>>
>> I added one.
> 
> Thanks. I'd move it to the inner indentation level so it's closer to
> the test at issue.

I meant for it to cover the call to HaveNFreeProcs() as well since the same
idea applies.  I left it the same for now, but if you still think it makes
sense to move it, I'll do so.

> I would also suggest reordering the documentation and the
> postgresql.conf.sample file so that reserved_connections precedes
> superuser_reserved_connections, instead of following it.

Makes sense.

> Other than that, this seems like it might be about ready to commit,
> barring objections or bug reports.

Awesome.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From a6811b643df94c9057373fd687398c85a807fd5e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 18 Jan 2023 12:43:41 -0800
Subject: [PATCH v5 1/3] Code review for ea92368.

This commit missed an error message and a line in the docs.
---
 doc/src/sgml/config.sgml  | 3 +--
 src/backend/utils/init/postinit.c | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 89d53f2a64..e019a1aac9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -725,8 +725,7 @@ include_dir 'conf.d'
 number of active concurrent connections is at least
 max_connections minus
 superuser_reserved_connections, new
-connections will be accepted only for superusers, and no
-new replication connections will be accepted.
+connections will be accepted only for superusers.

 

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ae5a85ed65..9145d96b38 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -931,7 +931,7 @@ InitPostgres(const char *in_dbname, Oid dboid,
 		!HaveNFreeProcs(ReservedBackends))
 		ereport(FATAL,
 (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
- errmsg("remaining connection slots are reserved for non-replication superuser connections")));
+ errmsg("remaining connection slots are reserved for superusers")));
 
 	/* Check replication permissions needed for walsender processes. */
 	if (am_walsender)
-- 
2.25.1

>From e0390f0120315746ea04a9fa1bf709def76e6196 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 17 Jan 2023 13:58:56 -0800
Subject: [PATCH v5 2/3] Rename ReservedBackends to
 SuperuserReservedConnections.

This is in preparation for adding a new reserved_connections GUC.
---
 src/backend/postmaster/postmaster.c | 20 ++--
 src/backend/utils/init/postinit.c   |  4 ++--
 src/backend/utils/misc/guc_tables.c |  2 +-
 src/include/postmaster/postmaster.h |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..3f799c4ac8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -204,15 +204,15 @@ char	   *Unix_socket_directories;
 char	   *ListenAddresses;
 
 /*
- * ReservedBackends is the number of backends reserved for superuser use.
- * This number is taken out of the pool size given by MaxConnections so
- * number of backend slots available to non-superusers is
- * (MaxConnections - ReservedBackends).  Note what this really means is
- * "if there are <= ReservedBackends connections available, only superusers
- * can make new connections" --- pre-existing superuser connections don't
- * count against the limit.
+ * SuperuserReservedConnections is the number of backends reserved for
+ * superuser use.  This number is taken out of the pool size given by
+ * MaxConnections so number of backend slots available to non-superusers is
+ * (MaxConnections - SuperuserReservedConnections).  Note what this really
+ * means is "if there are <= SuperuserReservedConnections connections
+ * available, only superusers can make new connections" --- pre-existing
+ * superuser connections don't count against the limit.
  */
-int			ReservedBackends;
+int			SuperuserReservedConnections;
 
 /* The socket(s) we're listening to. */
 #define MAXLISTEN	64
@@ -908,11 +908,11 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Check for invalid combinations of GUC settings.
 	 */
-	if (ReservedBackends >= MaxConnections)
+	if (SuperuserReservedConnections >= MaxConnections)
 	{
 		write_stderr("%s: superuser_reserved_connections (%d) must be less than max_connections (%d)\n",
 	 progname,
-	 ReservedBackends, MaxConnections);
+	 SuperuserReservedConnections, MaxConnections);
 		ExitPostmaster(1);
 	}
 	if (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 9145d96b38..40f145e0ab 100644
---

Re: generic plans and "initial" pruning

2023-01-19 Thread Tom Lane

I spent some time re-reading this whole thread, and the more I read
the less happy I got.  We are adding a lot of complexity and introducing
coding hazards that will surely bite somebody someday.  And after awhile
I had what felt like an epiphany: the whole problem arises because the
system is wrongly factored.  We should get rid of AcquireExecutorLocks
altogether, allowing the plancache to hand back a generic plan that
it's not certain of the validity of, and instead integrate the
responsibility for acquiring locks into executor startup.  It'd have
to be optional there, since we don't need new locks in the case of
executing a just-planned plan; but we can easily add another eflags
bit (EXEC_FLAG_GET_LOCKS or so).  Then there has to be a convention
whereby the ExecInitNode traversal can return an indicator that
"we failed because the plan is stale, please make a new plan".

There are a couple reasons why this feels like a good idea:

* There's no need for worry about keeping the locking decisions in sync
with what executor startup does.

* We don't need to add the overhead proposed in the current patch to
pass forward data about what got locked/pruned.  While that overhead
is hopefully less expensive than the locks it saved acquiring, it's
still overhead (and in some cases the patch will fail to save acquiring
any locks, making it certainly a net negative).

* In a successfully built execution state tree, there will simply
not be any nodes corresponding to pruned-away, never-locked subplans.
As long as code like EXPLAIN follows the state tree and doesn't poke
into plan nodes that have no matching state, it's secure against the
sort of problems that Robert worried about upthread.

While I've not attempted to write any code for this, I can also
think of a few issues that'd have to be resolved:

* We'd be pushing the responsibility for looping back and re-planning
out to fairly high-level calling code.  There are only half a dozen
callers of GetCachedPlan, so there's not that many places to be
touched; but in some of those places the subsequent executor-start call
is not close by, so that the necessary refactoring might be pretty
painful.  I doubt there's anything insurmountable, but we'd definitely
be changing some fundamental APIs.

* In some cases (views, at least) we need to acquire lock on relations
that aren't directly reflected anywhere in the plan tree.  So there'd
have to be a separate mechanism for getting those locks and rechecking
validity afterward.  A list of relevant relation OIDs might be enough
for that.

* We currently do ExecCheckPermissions() before initializing the
plan state tree.  It won't do to check permissions on relations we
haven't yet locked, so that responsibility would have to be moved.
Maybe that could also be integrated into the initialization recursion?
Not sure.

* In the existing usage of AcquireExecutorLocks, if we do decide
that the plan is stale then we are able to release all the locks
we got before we go off and replan.  I'm not certain if that behavior
needs to be preserved, but if it does then that would require some
additional bookkeeping in the executor.

* This approach is optimizing on the assumption that we usually
won't need to replan, because if we do then we might waste a fair
amount of executor startup overhead before discovering we have
to throw all that state away.  I think that's clearly the right
way to bet, but perhaps somebody else has a different view.

Thoughts?

regards, tom lane

Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-01-19 Thread Andrew Dunstan



On 2023-01-18 We 17:14, Tom Lane wrote:
> Andrew Dunstan  writes:
>> I think we can do what you want but it's a bit harder than what you've
>> done. If we're not going to save the current run's product then we need
>> to run the upgrade test from a different directory (probably directly in
>> "$buildroot/$this_branch/inst"). Otherwise we'll be testing upgrade to
>> the saved product of a previous run of this branch.
> Hmm, maybe that explains some inconsistent results I remember getting.
>
>> I'll take a stab at it tomorrow if you like.
> Please do.
>
>   


See



I tested it and it seems to be doing the right thing.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Teach planner to further optimize sort in distinct

2023-01-19 Thread Ankit Kumar Pandey




On 19/01/23 18:49, David Rowley wrote:



I think you should write a function like:



bool pathkeys_count_contained_in_unordered(List *keys1, List *keys2,
List **reorderedkeys, int *n_common)



which works very similarly to pathkeys> _count_contained_in, but
populates *reorderedkeys so it contains all of the keys in keys1, but
put the matching ones in the same order as they are in keys2.  If you
find a keys2 that does not exist in keys1 then just add the additional
unmatched keys1 keys to *reorderedkeys.  Set *n_common to the number
of common keys excluding any that come after a key2 key that does not
exist as a key1 key.



You can just switch to using that function in
create_final_distinct_paths(). You'll need to consider if the query is
a DISTINCT ON query and not try the unordered version of the function
in that case.


Tried this, it worked as expected. Tests are green as well.


I also just noticed that in build_index_paths() we'll leave the index
path's pathkeys empty if we deem the pathkeys as useless.  I'm not
sure what the repercussions of setting those to the return value of
build_index_pathkeys() if useful_pathkeys is otherwise empty.


This is very rigid indeed.


It's possible that truncate_useless_pathkeys() needs to be modified to
check if the pathkeys might be useful for DISTINCT 


We have pathkeys_useful_for_merging and pathkeys_useful_for_ordering.

Can we not have pathkeys_useful_for_distinct?

Also, pathkeys_useful_for_ordering calls pathkeys_count_contained_in.

We can add code path on similar lines.


When I
thought about it I assumed that we always set IndexPath's pathkeys to
whatever (if any) sort order that the index provides.


Can we not added original path keys in IndexPath? It could be useful

at other places as well. Atleast, I can see it useful in sorting cases.

makes me wonder if this entire patch is a good idea. 


We are still getting some benefit even without index paths for now.


I have attached patch with pathkeys_count_contained_in_unordered

and corresponding changes in create_final_distinct_paths for reference.


Thanks,

Ankit
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index d2e241c983..f0bc977eff 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -2014,3 +2014,52 @@ has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel)
 		return true;			/* might be able to use them for ordering */
 	return false;/* definitely useless */
 }
+
+/*
+ * pathkeys_count_contained_in_unordered
+ *		reorders keys1 to match keys2
+ * Populates *reorderedkeys so it contains all of the keys in keys1, but
+ * put the matching ones in the same order as they are in keys2.  If keys
+ * in keys1 which do not match keys2 are appended in the end.
+ * n_common denotes number of matched keys.
+ */
+bool
+pathkeys_count_contained_in_unordered(List *keys1, List *keys2,
+	  List **reorderedkeys, int *n_common)
+{
+	ListCell*	lc1;
+	ListCell*	lc2;
+	int			n = 0;
+	List*		unmatched_keys = NIL;
+
+	foreach(lc1, keys1)
+	{
+		bool matched = false;
+		PathKey*pathkey1 = (PathKey *) lfirst(lc1);
+		foreach(lc2, keys2)
+		{
+			PathKey*pathkey2 = (PathKey *) lfirst(lc2);
+			if (pathkey1 == pathkey2)
+			{
+*reorderedkeys = lappend(*reorderedkeys, pathkey2);
+n++;
+matched = true;
+break;
+			}
+
+		}
+		/*
+		 * If no match is found, collect path key for appending it later
+		 */
+		if (!matched)
+			unmatched_keys = lappend(unmatched_keys, pathkey1);
+	}
+
+	Assert(n == list_length(*reorderedkeys));
+
+	*reorderedkeys = list_concat_unique(*reorderedkeys, unmatched_keys);
+	Assert(list_length(*reorderedkeys) == list_length(keys1));
+	*n_common = n;
+
+	return *n_common == list_length(keys1);
+}
\ No newline at end of file
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 05f44faf6e..d9904b046a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4936,10 +4936,28 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 			Path	   *sorted_path;
 			bool		is_sorted;
 			int			presorted_keys;
+			List*		reordered_keys = NIL;
 
-			is_sorted = pathkeys_count_contained_in(needed_pathkeys,
+			/*
+			 * Attempt to optimize non distinct-on queries
+			 * if sorted keys are present but not in required order.
+			 * We can modify needed_path keys as per as input path key
+			 * ordering and reuse input sort.
+			 */
+			if (!parse->hasDistinctOn)
+			{
+is_sorted = pathkeys_count_contained_in_unordered(needed_pathkeys,
 	input_path->pathkeys,
+	&reordered_keys,
 	&presorted_keys);
+needed_pathkeys = reordered_keys;
+			}
+			else
+			{
+is_sorted = pathkeys_count_contained_in(needed_pathkeys,
+	input_path->pathkeys,
+	&presorted_keys);
+			}
 
 			if (is_sorted)
 sorted_path = input_path;
diff --git a/

Re: almost-super-user problems that we haven't fixed yet

2023-01-19 Thread Robert Haas

On Thu, Jan 19, 2023 at 12:54 PM Nathan Bossart
 wrote:
> > OK. Might be worth a short comment.
>
> I added one.

Thanks. I'd move it to the inner indentation level so it's closer to
the test at issue.

I would also suggest reordering the documentation and the
postgresql.conf.sample file so that reserved_connections precedes
superuser_reserved_connections, instead of following it.

Other than that, this seems like it might be about ready to commit,
barring objections or bug reports.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Non-superuser subscription owners

2023-01-19 Thread Robert Haas

On Thu, Jan 19, 2023 at 1:40 PM Jeff Davis  wrote:
> On Thu, 2023-01-19 at 10:45 -0500, Robert Haas wrote:
> > I wouldn't be OK with writing our own connection string parser for
> > this purpose, but using PQconninfoParse seems OK. We still have to
> > embed knowledge of which connection string parameters can trigger
> > local file access, but that doesn't seem like a massive problem to
> > me.
>
> Another idea (I discussed with Andres some time ago) was to have an
> option to libpq to turn off file access entirely. That could be a new
> API function or a new connection option.
>
> That would be pretty valuable by itself. Though we might want to
> support a way to pass SSL keys as values rather than file paths, so
> that we can still do SSL.

Maybe all of that would be useful, but it doesn't seem that mandatory.

> So perhaps the answer is that it will be a small patch to get non-
> superuser subscription owners, but we need three or four preliminary
> patches first.

I guess I'm not quite seeing it. Why can't we write a small patch to
get this working right now, probably in a few hours, and deal with any
improvements that people want at a later time?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: HOT chain validation in verify_heapam()

2023-01-19 Thread Robert Haas

On Thu, Jan 19, 2023 at 8:55 AM Aleksander Alekseev
 wrote:
> I noticed that this patch stuck a little and decided to take another look.
>
> It seems to be well written, covered with tests and my understanding
> is that all the previous feedback was accounted for. To your knowledge
> is there anything that prevents us from moving it to "Ready for
> Committer"?

Thanks for taking a look, and for pinging the thread.

I think that the handling of lp_valid[] in the loop that begins with
"Loop over offset and populate predecessor array from all entries that
are present in successor array" is very confusing. I think that
lp_valid[] should be answering the question "is the line pointer
basically sane?". That is, if it's a redirect, it needs to point to
something within the line pointer array (and we also check that it
must be an entry in the line pointer array that is used, which seems
fine). If it's not a redirect, it needs to point to space that's
entirely within the block, properly aligned, and big enough to contain
a tuple. We determine the answers to all of these questions in the
first loop, the one that starts with /* Perform tuple checks */.

Nothing that happens in the second loop, where we populate the
predecessor array, can reverse our previous conclusion that the line
pointer is valid, so this loop shouldn't be resetting entries in
lp_valid[] to false. The reason that it's doing so seems to be that it
wants to use lp_valid[] to control the behavior of the third loop,
where we perform checks against things that have entries in the
predecessor array. As written, the code ensures that we always set
lp_valid[nextoffnum] to false unless we set predecessor[nextoffnum] to
a value other than InvalidOffsetNumber. But that is needlessly
complex: the third loop doesn't need to look at lp_valid[] at all. It
can just check whether predecessor[currentoffnum] is valid. If it is,
perform checks. Otherwise, skip it. It seems to me that this would be
significantly simpler.

To put the above complaint another way, a variable shouldn't mean two
different things depending on where you are in the function. Right
now, at the end of the first loop, lp_valid[x] answers the question
"is line pointer x basically valid?". But by the end of the second
loop, it answers the question "is line pointer x valid and does it
also have a valid predecessor?". That kind of definitional change is
something to be avoided.

The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin))
seems wrong to me. Shouldn't it be &&? Has this code been tested at
all?  It doesn't seem to have a test case. Some of these other errors
don't, either. Maybe there's some that we can't easily test in an
automated way, but we should test what we can. I guess maybe casual
testing wouldn't reveal the problem here because of the recheck, but
it's worrying to find logic that doesn't look right with no
corresponding comments or test cases.

Some error message kibitizing:

 psprintf("redirected tuple at line pointer offset %u is not heap only tuple",

It seems to me that this should say "redirected line pointer pointing
to a non-heap-only tuple at offset %u". There is no such thing as a
redirected tuple -- and even if there were, what we have here is
clearly a redirected line pointer.

psprintf("redirected tuple at line pointer offset %u is not heap only tuple",

And I think for the same reasons this one should say something like
"redirected line pointer pointing to a non-heap-only tuple at offset
%u".

 psprintf("redirected tuple at line pointer offset %u is not heap
updated tuple",

Possibly all of these would sound better with "points" rather than
"pointing" -- if so, we'd need to change an existing message in the
same way.

And this one should say something like "redirected line pointer
pointing to a tuple not produced by an update at offset %u".

 psprintf("tuple is root of chain but it is marked as heap-only tuple"));

I think this would sound better if you deleted the word "it".

I don't know whether it's worth arguing about -- it feels like we've
argued too much already about this sort of thing -- but I am not very
convinced by initializers like OffsetNumber
predecessor[MaxOffsetNumber] = {InvalidOffsetNumber}. That style is
only correct because InvalidOffsetNumber happens to be zero. If it
were up to me, I'd use memset to clear the predecessor array. I would
not bulk initialize sucessor and lp_valid but make sure that the first
loop always sets them, possibly by having the top of the loop set them
to InvalidOffsetNumber and false initially and then letting code later
in the loop change the value, or possibly in some other way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: add PROCESS_MAIN to VACUUM

2023-01-19 Thread Nathan Bossart

On Thu, Jan 19, 2023 at 05:28:25PM +0530, vignesh C wrote:
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7a7f96bf4eea5be6cc252dda6bc330e77a6a3316 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 29 Dec 2022 15:31:49 -0800
Subject: [PATCH v3 1/1] add PROCESS_MAIN to VACUUM

---
 doc/src/sgml/ref/vacuum.sgml | 13 +
 doc/src/sgml/ref/vacuumdb.sgml   | 15 +++
 src/backend/commands/vacuum.c| 28 ++--
 src/backend/postmaster/autovacuum.c  |  4 +++-
 src/bin/psql/tab-complete.c  |  4 ++--
 src/bin/scripts/t/100_vacuumdb.pl|  7 +++
 src/bin/scripts/vacuumdb.c   | 24 
 src/include/commands/vacuum.h|  1 +
 src/test/regress/expected/vacuum.out |  4 
 src/test/regress/sql/vacuum.sql  |  5 +
 10 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 545b23b54f..b6d30b5764 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -33,6 +33,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ boolean ]
 SKIP_LOCKED [ boolean ]
 INDEX_CLEANUP { AUTO | ON | OFF }
+PROCESS_MAIN [ boolean ]
 PROCESS_TOAST [ boolean ]
 TRUNCATE [ boolean ]
 PARALLEL integer
@@ -238,6 +239,18 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ defname, "process_toast") == 0)
 			process_toast = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "truncate") == 0)
@@ -226,7 +229,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(disable_page_skipping ? VACOPT_DISABLE_PAGE_SKIPPING : 0) |
 		(process_toast ? VACOPT_PROCESS_TOAST : 0) |
 		(skip_database_stats ? VACOPT_SKIP_DATABASE_STATS : 0) |
-		(only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0);
+		(only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0) |
+		(process_main ? VACOPT_PROCESS_MAIN : 0);
 
 	/* sanity checks on options */
 	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE));
@@ -367,9 +371,10 @@ vacuum(List *relations, VacuumParams *params,
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("ONLY_DATABASE_STATS cannot be specified with a list of tables")));
-		/* don't require people to turn off PROCESS_TOAST explicitly */
+		/* don't require people to turn off PROCESS_TOAST/MAIN explicitly */
 		if (params->options & ~(VACOPT_VACUUM |
 VACOPT_VERBOSE |
+VACOPT_PROCESS_MAIN |
 VACOPT_PROCESS_TOAST |
 VACOPT_ONLY_DATABASE_STATS))
 			ereport(ERROR,
@@ -2031,10 +2036,12 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, bool skip_privs)
 	/*
 	 * Remember the relation's TOAST relation for later, if the caller asked
 	 * us to process it.  In VACUUM FULL, though, the toast table is
-	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
+	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it
+	 * unless PROCESS_MAIN is disabled.
 	 */
 	if ((params->options & VACOPT_PROCESS_TOAST) != 0 &&
-		(params->options & VACOPT_FULL) == 0)
+		((params->options & VACOPT_FULL) == 0 ||
+		 (params->options & VACOPT_PROCESS_MAIN) == 0))
 		toast_relid = rel->rd_rel->reltoastrelid;
 	else
 		toast_relid = InvalidOid;
@@ -2053,7 +2060,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, bool skip_privs)
 	/*
 	 * Do the actual work --- either FULL or "lazy" vacuum
 	 */
-	if (params->options & VACOPT_FULL)
+	if (params->options & VACOPT_FULL &&
+		params->options & VACOPT_PROCESS_MAIN)
 	{
 		ClusterParams cluster_params = {0};
 
@@ -2067,7 +2075,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, bool skip_privs)
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 		cluster_rel(relid, InvalidOid, &cluster_params);
 	}
-	else
+	else if (params->options & VACOPT_PROCESS_MAIN)
 		table_relation_vacuum(rel, params, vac_strategy);
 
 	/* Roll back any GUC changes executed by index functions */
@@ -2094,7 +2102,15 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, bool skip_privs)
 	 * totally unimportant for toast relations.
 	 */
 	if (toast_relid != InvalidOid)
+	{
+		/* we force VACOPT_PROCESS_MAIN so vacuum_rel() processes it */
+		bool force_opt = ((params->options & VACOPT_PROCESS_MAIN) == 0);
+
+		params->options |= VACOPT_PROCESS_MAIN;
 		vacuum_rel(toast_relid, NULL, params, true);
+		if (force_opt)
+			params->options &= ~VACOPT_PROCESS_MAIN;
+	}
 
 	/*
 	 * Now release the session-level lock on the main table.
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f5ea381c53..12dcb2b762 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2860,7 +2860,9 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		 * skip vac_

Re: Non-superuser subscription owners

2023-01-19 Thread Jeff Davis

On Thu, 2023-01-19 at 10:45 -0500, Robert Haas wrote:
> I wouldn't be OK with writing our own connection string parser for
> this purpose, but using PQconninfoParse seems OK. We still have to
> embed knowledge of which connection string parameters can trigger
> local file access, but that doesn't seem like a massive problem to
> me.

Another idea (I discussed with Andres some time ago) was to have an
option to libpq to turn off file access entirely. That could be a new
API function or a new connection option.

That would be pretty valuable by itself. Though we might want to
support a way to pass SSL keys as values rather than file paths, so
that we can still do SSL.

So perhaps the answer is that it will be a small patch to get non-
superuser subscription owners, but we need three or four preliminary
patches first.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-19 Thread Thomas Munro

On Tue, Jan 17, 2023 at 11:24 AM Thomas Munro  wrote:
> Another idea would be to teach the latch infrastructure itself to
> magically swap latch events to position 0.  Latches are usually
> prioritised; it's only in this rare race case that they are not.

I liked that idea for a while, but I suspect it is not really possible
to solve the problem completely this way, because it won't work on
Windows (see below) and the race I described earlier is probably not
the only one.  I think it must also be possible for poll() to ignore a
signal that becomes pending just as the system call begins and return
a socket fd that has also just become ready, without waiting (thus not
causing EINTR).  Then the handler would run after we return to
userspace, we'd see only the socket event, and a later call would see
the latch event.

So I think we probably need something like the attached, which I was
originally trying to avoid.

Looking into all that made me notice a related problem on Windows.
There's an interesting difference between the implementation of
select() in src/backend/port/win32/socket.c and the Windows
implementation of WaitEventSetBlock() in latch.c.  The latch.c code
only reports one event at a time, in event array order, because that's
WaitForMultipleObjects()'s contract and we expose that fairly
directly.  The older socket.c code uses that only for wakeup, and then
it polls *all* sockets to be able to report more than one at a time.
I was careful to use a large array of output events to preserve the
existing round-robin servicing of multiple server sockets, but I see
now that that only works on Unix.  On Windows, I suspect that one
socket receiving a fast enough stream of new connections could prevent
a second socket from ever being serviced.  I think we might want to do
something about that.
From 8b08f138a3286503e339b546d758ef683514a929 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 20 Jan 2023 05:50:39 +1300
Subject: [PATCH] Process pending postmaster work before connections.

Modify the new event loop code from commit 7389aad6 so that it checks
for work requested by signal handlers even if it doesn't see a latch
event yet.

This gives priority to shutdown and reload requests where the latch will
be reported later in the event array, or in a later call to
WaitEventSetWait(), due to scheduling details.  In particular, this
guarantees that a SIGHUP-then-connect sequence (as seen in
authentication tests) causes the postmaster to process the reload before
accepting the connection.  If the WaitEventSetWait() call saw the socket
as ready, and the reload signal was generated before the connection,
then the latest time the signal handler should be able to run is after
poll/epoll_wait/kevent returns but before we check the
pending_pm_reload_request flag.

Reported-by: Hou Zhijie 
Reported-by: Tom Lane 
Discussion: https://postgr.es/m/OS0PR01MB57163D3BF2AB42ECAA94E5C394C29%40OS0PR01MB5716.jpnprd01.prod.outlook.com

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..cec3fe8dc5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1739,20 +1739,24 @@ ServerLoop(void)
 		for (int i = 0; i < nevents; i++)
 		{
 			if (events[i].events & WL_LATCH_SET)
-			{
 ResetLatch(MyLatch);

-/* Process work requested via signal handlers. */
-if (pending_pm_shutdown_request)
-	process_pm_shutdown_request();
-if (pending_pm_child_exit)
-	process_pm_child_exit();
-if (pending_pm_reload_request)
-	process_pm_reload_request();
-if (pending_pm_pmsignal)
-	process_pm_pmsignal();
-			}
-			else if (events[i].events & WL_SOCKET_ACCEPT)
+			/*
+			 * The following is done unconditionally, even if we didn't see
+			 * WL_LATCH_SET.  This gives high priority to shutdown and reload
+			 * requests where the latch happens to appear later in events[] or
+			 * will be reported by a later call to WaitEventSetWait().
+			 */
+			if (pending_pm_shutdown_request)
+process_pm_shutdown_request();
+			if (pending_pm_child_exit)
+process_pm_child_exit();
+			if (pending_pm_reload_request)
+process_pm_reload_request();
+			if (pending_pm_pmsignal)
+process_pm_pmsignal();
+
+			if (events[i].events & WL_SOCKET_ACCEPT)
 			{
 Port	   *port;

-- 
2.35.1

Re: Non-superuser subscription owners

2023-01-19 Thread Jeff Davis

On Wed, 2023-01-18 at 14:38 -0500, Robert Haas wrote:
> I was just noticing that what was committed here didn't actually fix
> the problem implied by the subject line. That is, non-superuser still
> can't own subscriptions. To put that another way, there's no way for
> the superuser to delegate the setup and administration of logical
> replication to a non-superuser. That's a bummer.

Right, though as Mark pointed out, it does accomplish something even if
it's a bit unsatisfying. We could certainly do better here.

> 2. There was also quite a bit of discussion of what to do if a user
> who was previously eligible to own a subscription ceases to be
> eligible, in particular around a superuser who is made into a
> non-superuser, but the same problem would apply

Correct, that's not a new problem, but exists in only a few places now.
Our privilege system is focused on "what action can the user take right
now?", and gets weirder when it comes to object ownership, which is a
more permanent thing.

Extending that system to a subscription object, which has its own
capabilities including a long-lived process, is cause for some
hesitation. I agree it's not necessarily a blocker.

> 3. There was a bit of discussion of maybe wanting to allow users to
> create subscriptions with some connection strings but not others,

This was an alternative to trying to sanitize connection strings,
because it's a bit difficult to reason about what might be "safe"
connection strings for a non-superuser, because it's environment-
dependent. But if we do identify a reasonable set of sanitization
rules, we can proceed without 3.

> I am very curious to know (a) why work on this was abandoned (perhaps
> the answer is just lack of round tuits, in which case there is no
> more
> to be said), and (b) what people think of (1)-(3) above, and (c)
> whether anyone knows of further problems that need to be considered
> here.

(a) Mostly round-tuits. There are problems and questions; but there are
with any work, and they could be solved. Or, if they don't turn out to
be terribly serious, we could ignore them.

(b) When I pick this up again I would be inclined towards the
following: try to solve 4-5 (listed below) first, which are
independently useful; then look at both 1 and 3 to see which one
presents an agreeable solution faster. I'll probably ignore 2 because I
couldn't get agreement the last time around (I think Mark objected to
the idea of denying a drop in privileges).

(c) Let me add:

4. There are still differences between the subscription worker applying
a change and going through the ordinary INSERT paths, for instance with
RLS. Also solvable.

5. Andres raised in another thread the idea of switching to the table
owner when applying changes (perhaps in a
SECURITY_RESTRICTED_OPERATION?): 

https://www.postgresql.org/message-id/20230112033355.u5tiyr2bmuoc4...@awork3.anarazel.de

That seems related, and I like the idea.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Remove source code display from \df+?

2023-01-19 Thread Isaac Morland

On Thu, 19 Jan 2023 at 11:30, Justin Pryzby  wrote:

> On Wed, Jan 18, 2023 at 10:27:46AM -0500, Isaac Morland wrote:
> >
> > I thought I had: https://commitfest.postgresql.org/42/4133/
>
> This is failing tests:
> http://cfbot.cputube.org/isaac-morland.html
>
> It seems like any "make check" would fail .. but did you also try
> cirrusci from your own github account?
> ./src/tools/ci/README
>

I definitely ran "make check" but I did not realize there is also cirrusci.
I will look at that. I'm having trouble interpreting the cfbot page to
which you linked but I'll try to run cirrusci myself before worrying too
much about that.

Running "make check" the first time I was surprised to see no failures - so
I added tests for \df+, which passed when I did "make check".

> BTW, I think "ELSE NULL" could be omitted.
>

Looks like; I'll update. Might as well reduce the visual size of the code.

Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-19 Thread Karl O. Pinc

On Thu, 19 Jan 2023 11:03:53 -0600
"Karl O. Pinc"  wrote:

> Attached are 2 patches, a regular and a delta from your v4 review:
> 
> contrib_v5-delta.patch.txt
> contrib_v5.patch.txt

I left your appendix title unchanged: "Additional Supplied 
Extensions and Modules".  

I had put "Extensions" after
"Modules", because, apparently, things that come last in the
sentence are most remembered by the reader. My impression is that
more people are looking for extensions than modules.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: Add LZ4 compression in pg_dump

2023-01-19 Thread Tomas Vondra

Hi,

On 1/19/23 17:42, gkokola...@pm.me wrote:
> 
> --- Original Message ---
> On Thursday, January 19th, 2023 at 4:45 PM, Tomas Vondra 
>  wrote:
>>
>> On 1/18/23 20:05, gkokola...@pm.me wrote:
>>
>>> --- Original Message ---
>>> On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra 
>>> tomas.von...@enterprisedb.com wrote:
>>
>> I'm not sure I understand why leave the lz4/zstd in this place?
> 
> You are right, it is not obvious. Those were added in 5e73a60488 which is
> already committed in master and I didn't want to backtrack. Of course, I am
> not opposing in doing so if you wish.
> 

Ah, I didn't realize it was already added by earlier commit. In that
case let's not worry about it.

>>
 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep
 "none" at the end. It might make backpatches harder.
>>>
>>> Agreed. However a 'default' is needed in order to avoid compilation 
>>> warnings.
>>> Also note that 0002 completely does away with cases within 
>>> WriteDataToArchive.
>>
>>
>> OK, although that's also a consequence of using a "switch" instead of
>> plan "if" branches.
>>
>> Furthermore, I'm not sure we really need the pg_fatal() about invalid
>> compression method in these default blocks. I mean, how could we even
>> get to these places when the build does not support the algorithm? All
>> of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...)
>> happens looong after the compressor was initialized and the method
>> checked, no? So maybe either this should simply do Assert(false) or use
>> a different error message.
> 
> I like Assert(false).
> 

OK, good. Do you agree we should never actually get there, if the
earlier checks work correctly?

>>
 4) "cfp" struct no longer wraps gzFile, but the comment was not updated.
 FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be
 better to have a "union" of correct types?
>>>
>>> Please find and updated comment and a union in place of the void *. Also
>>> note that 0002 completely does away with cfp in favour of a new struct
>>> CompressFileHandle. I maintained the void * there because it is used by
>>> private methods of the compressors. 0003 contains such an example with
>>> LZ4CompressorState.
>>
>>
>> I wonder if this (and also the previous item) makes sense to keep 0001
>> and 0002 or to combine them. The "intermediate" state is a bit annoying.
> 
> Agreed. It was initially submitted as one patch. Then it was requested to be
> split up in two parts, one to expand the use of the existing API and one to
> replace with the new interface. Unfortunately the expansion of usage of the
> existing API requires some tweaking, but that is not a very good reason for
> the current patch set. I should have done a better job there.
> 
> Please find v22 attach which combines back 0001 and 0002. It is missing the
> documentation that was discussed above as I wanted to give a quick feedback.
> Let me know if you think that the combined version is the one to move forward
> with.
> 

Thanks, I'll take a look.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Support logical replication of DDLs

2023-01-19 Thread Zheng Li

On Thu, Jan 19, 2023 at 2:05 AM Amit Kapila  wrote:
>
> On Thu, Jan 19, 2023 at 8:39 AM Zheng Li  wrote:
> >
> > On Wed, Jan 18, 2023 at 6:27 AM Amit Kapila  wrote:
> > >
> > > On Sat, Jan 7, 2023 at 8:58 PM Zheng Li  wrote:
> > > >
> >
> > Foreign Tables can also be considered replicated with DDL replication 
> > because we
> > don't even need to replicate the data as it resides on the external
> > server. Users
> > need to configure the external server to allow connection from the
> > subscriber for
> > foreign tables to work on the subscriber.
> >
>
> So, this would mean that we expect the subscriber will also have the
> same foreign server as the publisher because we will replicate the
> entire connection/user information of the foreign server for the
> publisher.

Yes, CREATE/ALTER SERVER commands are also supported by the current
DDL replication patch.

>But what about data inserted by the publisher on the
> foreign server?

I thought the data inserted to a foreign table will always be stored
on the foreign server unless I'm mistaken?

> > > We should also think
> > > about initial sync for all those objects as well.
> >
> > Agree, we're starting an investigation on initial sync. But I think
> > initial sync depends on
> > DDL replication to work reliably, not the other way around. DDL replication 
> > can
> > work on its own without the initial sync of schema, users just need to
> > setup the initial
> > schema just like they would today.
> >
>
> The difference is that today users need to take care of all schema
> setup on both and follow changes in the same on the publisher. But
> with DDL replication, there has to be a point prior to which both the
> nodes have the same setup. For that, before setting up DDL
> replication, users need to ensure that both nodes have the same
> schema, and then during setup, the user doesn't perform any DDL on the
> publisher.

The users can perform DDL during the setup if they do the following:
1. Create a logical replication slot to capture changes on the publisher
2. Do a backup for the publisher
3. Restore the backup as the subscriber
4. Advance the logical slot to the last valid LSN of the restore
5. Create pub/sub and use the above logical slot.

Regards,
Zane

Re: almost-super-user problems that we haven't fixed yet

2023-01-19 Thread Nathan Bossart

On Thu, Jan 19, 2023 at 11:40:53AM -0500, Robert Haas wrote:
> On Wed, Jan 18, 2023 at 4:14 PM Nathan Bossart  
> wrote:
>> On Wed, Jan 18, 2023 at 02:51:38PM -0500, Robert Haas wrote:
>> > Should (nfree < SuperuserReservedBackends) be using <=, or am I confused?
>>
>> I believe < is correct.  At this point, the new backend will have already
>> claimed a proc struct, so if the number of remaining free slots equals the
>> number of reserved slots, it is okay.
> 
> OK. Might be worth a short comment.

I added one.

>> > What's the deal with removing "and no new replication connections will
>> > be accepted" from the documentation? Is the existing documentation
>> > just wrong? If so, should we fix that first? And maybe delete
>> > "non-replication" from the error message that says "remaining
>> > connection slots are reserved for non-replication superuser
>> > connections"? It seems like right now the comments say that
>> > replication connections are a completely separate pool of connections,
>> > but the documentation and the error message make it sound otherwise.
>> > If that's true, then one of them is wrong, and I think it's the
>> > docs/error message. Or am I just misreading it?
>>
>> I think you are right.  This seems to have been missed in ea92368.  I moved
>> this part to a new patch that should probably be back-patched to v12.
> 
> I'm inclined to commit it to master and not back-patch. It doesn't
> seem important enough to perturb translations.

That seems reasonable to me.

> Tushar seems to have a point about pg_use_reserved_connections vs.
> pg_use_reserved_backends. I think we should standardize on the former,
> as backends is an internal term.

Oops.  This is what I meant to do.  I probably flubbed it because I was
wondering why the parameter uses "connections" and the variable uses
"backends," especially considering that the variable for max_connections is
called MaxConnections.  I went ahead and renamed everything to use
"connections."

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7be7e70aaf488a924d61b21b351a3b4f7e33aedc Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 18 Jan 2023 12:43:41 -0800
Subject: [PATCH v4 1/3] Code review for ea92368.

This commit missed an error message and a line in the docs.
---
 doc/src/sgml/config.sgml  | 3 +--
 src/backend/utils/init/postinit.c | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 89d53f2a64..e019a1aac9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -725,8 +725,7 @@ include_dir 'conf.d'
 number of active concurrent connections is at least
 max_connections minus
 superuser_reserved_connections, new
-connections will be accepted only for superusers, and no
-new replication connections will be accepted.
+connections will be accepted only for superusers.

 

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ae5a85ed65..9145d96b38 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -931,7 +931,7 @@ InitPostgres(const char *in_dbname, Oid dboid,
 		!HaveNFreeProcs(ReservedBackends))
 		ereport(FATAL,
 (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
- errmsg("remaining connection slots are reserved for non-replication superuser connections")));
+ errmsg("remaining connection slots are reserved for superusers")));
 
 	/* Check replication permissions needed for walsender processes. */
 	if (am_walsender)
-- 
2.25.1

>From 058df2b3dcf50ecbe76c794f4f52751e6a9f765f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 17 Jan 2023 13:58:56 -0800
Subject: [PATCH v4 2/3] Rename ReservedBackends to
 SuperuserReservedConnections.

This is in preparation for adding a new reserved_connections GUC.
---
 src/backend/postmaster/postmaster.c | 20 ++--
 src/backend/utils/init/postinit.c   |  4 ++--
 src/backend/utils/misc/guc_tables.c |  2 +-
 src/include/postmaster/postmaster.h |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..3f799c4ac8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -204,15 +204,15 @@ char	   *Unix_socket_directories;
 char	   *ListenAddresses;
 
 /*
- * ReservedBackends is the number of backends reserved for superuser use.
- * This number is taken out of the pool size given by MaxConnections so
- * number of backend slots available to non-superusers is
- * (MaxConnections - ReservedBackends).  Note what this really means is
- * "if there are <= ReservedBackends connections available, only superusers
- * can make new connections" --- pre-existing superuser connections don't
- * count against the limit.
+ * SuperuserReservedConnections is the number of backends reserved for
+ * superuse

Re: Add semi-join pushdown to postgres_fdw

2023-01-19 Thread Tomas Vondra

Hi.

I took a quick look at the patch. It needs a rebase, although it applies
fine using patch.

A couple minor comments:

1) addl_conds seems a bit hard to understand, I'd use either the full
wording (additional_conds) or maybe extra_conds

2) some of the lines got quite long, and need a wrap

3) unknown_subquery_rels name is a bit misleading - AFAIK it's the rels
that can't be referenced from upper rels (per what the .h says). So they
are known, but hidden. Is there a better name?

4) joinrel_target_ok() needs a better comment, explaining *when* the
reltarget is safe for pushdown. The conditions are on the same row, but
the project style is to break after '&&'.

Also, I'd write

if (!IsA(var, Var))
continue;

which saves one level of nesting. IMHO that makes it more readable.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: minor bug

2023-01-19 Thread Tom Lane

=?UTF-8?Q?Torsten_F=C3=B6rtsch?=  writes:
> Why not

> (void)getRecordTimestamp(record, &recordXtime);
> if (recoveryTarget == RECOVERY_TARGET_TIME)
> ...

Could've done it like that, but I already pushed the other
version, and I don't think it's worth the trouble to change.

regards, tom lane

Re: minor bug

2023-01-19 Thread Tom Lane

Laurenz Albe  writes:
> On Wed, 2023-01-18 at 15:03 -0500, Tom Lane wrote:
>> Ah, but that only happens if recoveryTarget == RECOVERY_TARGET_TIME.
>> Digging in the git history, I see that this did use to work as
>> I remember: we always extracted the record time before printing it.
>> That was accidentally broken in refactoring in c945af80c.  I think
>> the correct fix is more like the attached.

> Yes, you are right.  Your patch looks fine to me.

Pushed then.  Thanks for the report!

regards, tom lane

Re: minor bug

2023-01-19 Thread Torsten Förtsch

If we never expect getRecordTimestamp to fail, then why put it in the
if-condition?

getRecordTimestamp can fail if the record is not a restore point nor a
commit or abort record. A few lines before in the same function there is
this:

 /* Otherwise we only consider stopping before COMMIT or ABORT records. */
if (XLogRecGetRmid(record) != RM_XACT_ID)
return false;

I guess that make sure getRecordTimestamp can never fail.

The way it is written in your patch invites it to be optimized out again.
The only thing that prevents it is the comment.

Why not

(void)getRecordTimestamp(record, &recordXtime);
if (recoveryTarget == RECOVERY_TARGET_TIME)
...

On Wed, Jan 18, 2023 at 9:03 PM Tom Lane  wrote:

> Laurenz Albe  writes:
> > On Tue, 2023-01-17 at 10:32 -0500, Tom Lane wrote:
> >> I seem to recall that the original idea was to report the timestamp
> >> of the commit/abort record we are stopping at.  Maybe my memory is
> >> faulty, but I think that'd be significantly more useful than the
> >> current behavior, *especially* when the replay stopping point is
> >> defined by something other than time.
> >> (Also, the wording of the log message suggests that that's what
> >> the reported time is supposed to be.  I wonder if somebody messed
> >> this up somewhere along the way.)
>
> > recoveryStopTime is set to recordXtime (the time of the xlog record)
> > a few lines above that patch, so this is useful information if it is
> > present.
>
> Ah, but that only happens if recoveryTarget == RECOVERY_TARGET_TIME.
> Digging in the git history, I see that this did use to work as
> I remember: we always extracted the record time before printing it.
> That was accidentally broken in refactoring in c945af80c.  I think
> the correct fix is more like the attached.
>
> regards, tom lane
>
>

Re: vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.

2023-01-19 Thread Arthur Nascimento

On Thu, 19 Jan 2023 at 14:10, Nathan Bossart  wrote:
> This was recently back-patched [0] [1] [2].

Oh, I see that now. Thanks! Sorry for the noise.

--
Arthur Nascimento
EDB

Re: vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.

2023-01-19 Thread Nathan Bossart

On Thu, Jan 19, 2023 at 10:42:47AM -0300, Arthur Nascimento wrote:
> Would it be possible to backport the patch to 12 and 13?

This was recently back-patched [0] [1] [2].

[0] https://postgr.es/m/y70xnvbuwqsr2...@paquier.xyz
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c0ee694
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=72b6098

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

1 2 >

1 - 100 of 169 matches

Mail list logo